Via WebSockets

WebSocket conversations support the same features as webhook conversations but are much more flexible. Because you will run your own server that manually forwards audio chunks between Phonic and your own audio interface, you can implement offline tools and use your own audio interface, via telephony, web, or otherwise.

In this guide we’ll demonstrate the basic mechanics of WebSocket conversations. The example code for this guide is available here.

Open a WebSocket connection to Phonic

The first step is to open a WebSocket connection to Phonic, which sends and receives various types of messages, detailed here.

On the receiving side, the most important message is the audio_chunk_response, which contains the Phonic-generated audio that your server will forward to your audio interface, along with its transcript.

Similarly, on the sending side, the most important message is the audio_chunk, which contains the user audio that your server will forward to Phonic.

Here is an example implementation of this WebSocket connection:

1 import { createNodeWebSocket } from "@hono/node-ws";
2 import { Hono } from "hono";
3 import type { WSContext } from "hono/ws";
4 import { PhonicClient } from "phonic";
5 import { phonicApiKey } from "./env-vars";
6 
7 const app = new Hono();
8 const phonicClient = new PhonicClient({
9   apiKey: phonicApiKey,
10 });
11 
12 const { injectWebSocket, upgradeWebSocket } = createNodeWebSocket({ app });
13 
14 app.get(
15   "/ws",
16   upgradeWebSocket(() => {
17     let phonicSocket: Awaited<
18       ReturnType<typeof phonicClient.conversations.connect>
19     > | null = null;
20     let streamSid: string | null = null;
21 
22     const sendToTwilio = (ws: WSContext, data: unknown) => {
23       ws.send(JSON.stringify(data));
24     };
25 
26     return {
27       async onOpen(_, ws) {
28         phonicSocket = await phonicClient.conversations.connect();
29 
30         phonicSocket.on("message", (message) => {
31           if (streamSid && message.type === "audio_chunk") {
32             sendToTwilio(ws, {
33               event: "media",
34               streamSid: streamSid,
35               media: {
36                 payload: message.audio,
37               },
38             });
39           }
40         })
41 
42         await phonicSocket.sendConfig({
43           type: "config",
44           agent: "agent-websocket",
45         };
46       },
47 
48       async onMessage(event, ws) {
49         const message = event.data;
50         if (typeof message !== "string") return;
51         const data = JSON.parse(message);
52         switch (data.event) {
53           case "start":
54             streamSid = data.streamSid;
55             break;
56 
57           case "media":
58             if (phonicSocket && data.media.track === "inbound") {
59               await phonicSocket.sendAudioChunk({
60                 type: "audio_chunk",
61                 audio: data.media.payload,
62               });
63             }
64             break;
65 
66           case "stop":
67             ws.close();
68             break;
69         }
70       },
71 
72       onClose() {
73         console.log("Twilio WebSocket closed");
74         if (phonicSocket) {
75           phonicSocket.close();
76         }
77       },
78     };
79   }),
80 );

Open a connection with your audio interface

Now that your server is open and receiving messages from Phonic, you can use any audio interface for the user. Here is a simple example using Twilio:

1 import VoiceResponse from "twilio/lib/twiml/VoiceResponse";
2 
3 app.post("/inbound", (c) => {
4   const url = new URL(c.req.url);
5   const response = new VoiceResponse();
6   response.connect().stream({
7     url: `wss://${url.host}/ws`,
8   });
9   return c.text(response.toString(), 200, { "Content-Type": "text/xml" });
10 });

See here for a complete code example.