Web Socket

Real-time Voice Chat API Documentation

WebSocket Endpoints

1. Media Stream Endpoint

URL: wss://voice-chat-socket.med2lab.com/media-stream Protocol: WebSocket Secure (WSS) Description: Handles real-time audio streaming between client and OpenAI's voice model.

Query Parameters

Parameter
Type
Required
Default
Description

token

string

Yes

-

Authorization token

topic_id

string

No

'102'

Topic identifier

Example Usage:

wss://voice-chat-socket.med2lab.com/media-stream?token=eyJhbGciOiJIUzI1NiIs...&topic_id=102

Connection Flow

  1. Client initiates WSS connection

  2. Connection established with WebSocket

  3. Bi-directional audio streaming begins

Audio Data Format

  • Client to Server:

    • Raw audio buffer

    • 16-bit PCM

    • Sample Rate: 16kHz

    • Channels: Mono

    • Chunk Size: 4096 samples

  • Server to Client:

    • Audio: Base64 encoded audio data

    • Text: UTF-8 encoded transcripts

2. Chat Stream Endpoint

URL: wss://voice-chat-socket.med2lab.com/chat-stream Protocol: WebSocket Secure (WSS) Description: Handles real-time chat interactions with voice capability.

Query Parameters

Parameter
Type
Required
Default
Description

token

string

Yes

-

Authorization token

CaseId

string

No

'653'

Case identifier

Example Usage:

wss://voice-chat-socket.med2lab.com/chat-stream?token=eyJhbGciOiJIUzI1NiIs...&CaseId=653

Special Response Markers

The system includes special markers in responses:

  • [[EMAIL]]: Added when user mentions email

  • [[ENDCHAT*Feedback*Chat xong rồi !!!]]: Added when user mentions going to sleep

Client Integration Examples

ReactJS Example

import { useEffect, useRef } from 'react';

const VoiceChatComponent = () => {
  const wsMediaRef = useRef(null);
  const wsChatRef = useRef(null);
  const token = 'your_auth_token';
  const host = 'voice-chat-socket.med2lab.com';

  useEffect(() => {
    // Media Stream Connection
    wsMediaRef.current = new WebSocket(
      `wss://${host}/media-stream?token=${token}&topic_id=102`
    );
    wsMediaRef.current.binaryType = 'arraybuffer';

    // Chat Stream Connection
    wsChatRef.current = new WebSocket(
      `wss://${host}/chat-stream?token=${token}&CaseId=653`
    );
    wsChatRef.current.binaryType = 'arraybuffer';

    // Event Handlers
    wsMediaRef.current.onmessage = (event) => {
      if (typeof event.data === 'string') {
        console.log('Received transcript:', event.data);
      } else {
        // Handle binary audio data
        const audioData = event.data;
        // Process audio...
      }
    };

    wsMediaRef.current.onerror = (error) => {
      console.error('WebSocket Error:', error);
    };

    // Cleanup on component unmount
    return () => {
      wsMediaRef.current?.close();
      wsChatRef.current?.close();
    };
  }, []);

  return <div>Voice Chat Component</div>;
};

export default VoiceChatComponent;

Python Example

import asyncio
import websockets
import json
import base64

async def connect_voice_chat():
    token = "your_auth_token"
    host = "voice-chat-socket.med2lab.com"
    
    # Media Stream Connection
    async with websockets.connect(
        f"wss://{host}/media-stream?token={token}&topic_id=102"
    ) as ws_media:
        try:
            # Send audio data
            audio_chunk = b"..." # Your audio data here
            await ws_media.send(audio_chunk)
            
            # Receive response
            while True:
                response = await ws_media.recv()
                if isinstance(response, str):
                    # Handle text transcript
                    print(f"Received transcript: {response}")
                else:
                    # Handle binary audio data
                    audio_data = response
                    # Process audio...
                    
        except websockets.exceptions.ConnectionClosed:
            print("Connection closed")

# Run the async function
asyncio.get_event_loop().run_until_complete(connect_voice_chat())

C# Example

using System;
using System.Net.WebSockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

public class VoiceChatClient
{
    private ClientWebSocket _mediaWs;
    private ClientWebSocket _chatWs;
    private readonly string _token;
    private readonly string _host;
    
    public VoiceChatClient(string token)
    {
        _token = token;
        _host = "voice-chat-socket.med2lab.com";
        _mediaWs = new ClientWebSocket();
        _chatWs = new ClientWebSocket();
    }
    
    public async Task ConnectAsync()
    {
        // Connect to Media Stream
        var mediaUri = new Uri($"wss://{_host}/media-stream?token={_token}&topic_id=102");
        await _mediaWs.ConnectAsync(mediaUri, CancellationToken.None);
        
        // Start receiving messages
        _ = ReceiveMessagesAsync(_mediaWs);
    }
    
    private async Task ReceiveMessagesAsync(ClientWebSocket ws)
    {
        var buffer = new byte[4096];
        
        try
        {
            while (ws.State == WebSocketState.Open)
            {
                var result = await ws.ReceiveAsync(
                    new ArraySegment<byte>(buffer), CancellationToken.None);

                if (result.MessageType == WebSocketMessageType.Text)
                {
                    var message = Encoding.UTF8.GetString(buffer, 0, result.Count);
                    Console.WriteLine($"Received text: {message}");
                }
                else if (result.MessageType == WebSocketMessageType.Binary)
                {
                    // Handle binary audio data
                    var audioData = new byte[result.Count];
                    Array.Copy(buffer, audioData, result.Count);
                    // Process audio...
                }
                else if (result.MessageType == WebSocketMessageType.Close)
                {
                    await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, 
                        string.Empty, CancellationToken.None);
                    break;
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
    }
    
    public async Task SendAudioAsync(byte[] audioData)
    {
        if (_mediaWs.State == WebSocketState.Open)
        {
            await _mediaWs.SendAsync(
                new ArraySegment<byte>(audioData),
                WebSocketMessageType.Binary,
                true,
                CancellationToken.None);
        }
    }
    
    public async Task CloseAsync()
    {
        if (_mediaWs.State == WebSocketState.Open)
            await _mediaWs.CloseAsync(WebSocketCloseStatus.NormalClosure, 
                string.Empty, CancellationToken.None);
        
        if (_chatWs.State == WebSocketState.Open)
            await _chatWs.CloseAsync(WebSocketCloseStatus.NormalClosure, 
                string.Empty, CancellationToken.None);
    }
}

// Usage example:
// var client = new VoiceChatClient("your_auth_token");
// await client.ConnectAsync();
// await client.SendAudioAsync(audioData);
// await client.CloseAsync();

Error Handling

The server may close the connection with the following error codes:

  • 1008: Authorization errors (missing/invalid token, missing API key)

  • Connection errors with OpenAI service

  • Internal server errors

Security Considerations

  1. All connections must use WSS (WebSocket Secure)

  2. Token-based authentication is required for all endpoints

  3. Server enforces proper error handling and connection timeouts

Last updated