Realtime Conversational AI

The Realtime Conversational AI system is the plugin’s most advanced feature, enabling natural, fluid, and fully interruptible (or “barge-in”) conversations with an AI. This system is designed to replicate the natural turn-taking of human conversation, creating deeply immersive and believable character interactions.

Core Concept: A Managed Conversation

Unlike simple request-response chats, a real-time conversation requires careful management of who is speaking, when they can be interrupted, and how to handle audio data for both the player and the AI.

This entire process is managed by an internal state machine within the plugin’s C++ classes. You don’t need to build complex state trackers in your Blueprints; you simply call the provided functions, and the plugin handles the turn-taking, audio buffering, and synchronization automatically.

Barge-In Support: Users can interrupt the AI at any time by simply starting to speak.
State Management: The plugin automatically handles the conversation state, including whose turn it is to speak.
Phantom Audio Prevention: Intelligently discards outdated audio from the AI if it gets interrupted, preventing the user from hearing fragments of a cut-off sentence.

Blueprint Implementation Walkthrough

Setting up a real-time conversation involves three main stages: initializing the service, handling the user’s speech input, and processing the AI’s audio response.

1. Initialization (on `Event BeginPlay`)

First, you need to set up the service, create an audio component to play the AI’s voice, and give the AI its initial instructions.

Realtime Setup Blueprint — Initial setup: creating the service, audio component, and setting the system prompt.

Create Realtime Service: This node creates the main service object. Promote the return value to a variable (e.g., RealtimeService) so you can access it later.
Add Audio Component: This will be used to play the AI’s voice. Promote it to a variable as well (e.g., AI_AudioPlayer).
Set System Instructions: This is crucial. It’s where you define the AI’s personality, role, and any other context it needs.
Send Text To Server: Use this node to kick off the conversation with a starting phrase, like “Greetings.”

2. Handling User Input (Push-to-Talk)

The most common setup is a push-to-talk key. The logic is split between the key’s “Pressed” and “Released” events.

User Interruption Blueprint — The logic for interrupting the AI and starting the user's turn.

When the Push-to-Talk key is PRESSED:

This action signifies the user wants to speak, interrupting the AI if it’s currently talking.

Stop: Immediately call Stop on your AI_AudioPlayer component to silence the AI.
Clear Input Buffer: Call this on the RealtimeService. This tells the server to discard whatever it was about to say.
Start Capturing Audio: Use Unreal’s built-in Start-Capturing-Audio node to begin recording from the user’s microphone.

When the Push-to-Talk key is RELEASED:

This signifies the user has finished speaking and is now waiting for the AI’s response.

Stop Capturing Audio: Get the captured audio data.
Convert Audio: Use the ConvertAudioToPCM16Mono24kHz helper to format the recorded audio correctly for the AI.
Send Audio to Server: Feed the converted audio data into the Send-Audio-to-Server node.

3. Handling the AI’s Audio Response

The RealtimeService provides events that fire when the AI responds. You’ll use these to play the incoming audio.

AI Response Blueprint — Handling the AI's audio response in Blueprints.

On Response Created: This event fires when the AI has processed the user’s speech and is about to start generating its own audio response. Here, you should create a new SoundWaveProcedural to hold the incoming audio.
On Audio Received: This event fires repeatedly, providing chunks of audio data from the AI. Use the QueueAudioToProceduralWave helper to feed this data into the procedural sound wave you just created.
Play Audio: Make sure your AI_AudioPlayer is playing the procedural sound wave.

Audio Utility Helpers

The plugin provides the GenAIAudioUtils class with several essential Blueprint nodes for real-time audio.

Audio Utility Nodes — The key audio utility nodes for real-time conversation.

Create Empty Procedural Wave: Creates a new, empty sound wave that can be played immediately and fed with audio data as it arrives.
Queue Audio to Procedural Wave: Appends a chunk of raw audio data to an existing procedural sound wave for seamless playback.
Convert Audio To PCM16 Mono 24kHz: A crucial converter that takes standard microphone input and formats it into the specific PCM format required by most AI services.

Best Practices

Enable Input: If your conversational logic is in an Actor that isn’t a Pawn, you must use the Enable Input node (connected to a Get Player Controller) on BeginPlay to allow it to receive key presses.
Keep Blueprint Logic Simple: The C++ classes are designed to handle the complex state management. Your Blueprint logic should focus on connecting the events as shown above, not on tracking whose turn it is.
Error Handling: Real-time systems can fail due to network issues. Build logic to handle connection errors gracefully, perhaps by attempting to reconnect or providing a fallback text-based chat.
Memory Management: Be mindful of creating USoundWaveProcedural assets. Ensure they are properly managed by the garbage collector and not held onto indefinitely after a conversation ends.