Prompting guide

In this prompt guide, we’ll walk you through some best practices around structuring your system prompt to maximize intelligence, pronunciation and naturalness. Additionally, our models are post-trained for specific customer use cases, and this guide should help you get the most out of them.

Spelling out letters/IDs

A common use-case is to spell out usernames or emails back to the user. We recommend adding the following language:

Spell the email back slowly to the caller to confirm. Do this by spelling letter by letter using caps with spaces. For example: “C A P S W I T H S P A C E S 1 2 3 @ gmail dot com”

Avoid adding punctuation or periods to delimit letters; the model should spell out individual capitalized letters slowly.

The model is also aware of IDs with alphanumerics, so IDs such as “BKDJ134” will be read out letter by letter (no need to separate with spaces).

Emails

To read out emails, if you would like them spelled out letter by letter, refer to the prior section. However, to read them out naturally, simply provide them as-is. The model is post-trained to recognize them. For example, “sarah.martinez@techcorp.io” will be read out as “sarah” “dot” “martinez” “at” “tech” “corp” “dot” “I” “O” without needing any formatting changes.

Phone numbers

The model natively understands phone numbers, so numbers written out as “1234567890” will be read out as “123”, “456”, “7890” in groups (especially if there is context in the conversation that the number is a phone number). There is no need to add anything to the prompt to exhibit this behavior.

Not speaking

To instruct the model not to speak, add this language to your prompt:

Please output precisely ”(())” when the user input is a statement that warrants silent response. ”(())” is substitute for silent response. Then, immediately finish the response output to end your turn. It is forbidden to say something after ”(())”.

We explicitly recognize this pattern and handle the silent response.

Disfluencies

Disfluencies can greatly improve the naturalness of the conversation, and the model is post-trained to say disfluencies naturally. This is one example of language in a prompt that can be added:

Be conversational. Use disfluencies and simulate real human speech. Speak naturally and casually, using simple language with occasional filler words. Avoid sounding too formal or robotic.

Providing examples of specific responses to common queries that include the right casual language can also be helpful.

Avoid robotic repetition

If the agent sounds repetitive or overuses the same phrases, include an explicit variety instruction:

Do not repeat the same sentence twice. Vary your responses so it doesn’t sound robotic.

Speed

To control the overall speed of the speech, we expose a speed slider/parameter.

The model will infer when to slow down and speed up as well based on context. For example, if the user asks the agent to say something slower, it should respond at a slower pace.

There are times though that you may want to explicitly control the speed for certain turns; in this case, we recommend using [slower] and [faster] tags:

When reading a URL aloud, include the “[slower]” tag before the URL. For example, “[slower] mysite.com/info”