KB113: Speech Tags

What are Speech Tags

You can use Speech Tags wherever you can specify text to be spoken using Text to Speech (TTS). Speech Tags can be used to change the quality of the voice itself, or to change the pronunciation of a word. Some tags, such as [silence 1s] are used alone, but most are opening and closing tags that form a pair that wrap a word or phrase, such as [spell]123[/spell].

You can use the Script panel's Play button to listen to your tagged text to make sure it sounds right. If you hear the voice speak the text of a tag, then you know that the tag was not understood as a tag - check the spelling using the reference below. Be sure that each opening tag has a corresponding closing tag. If tags are not properly matched, you may not hear the voice at all.

Speech Tag Reference

Here is a list of available tags.

Tag	Description	Example
[affectionate], [angry], etc.	Specifies a voice style when using compatible Microsoft voices.	[angry]That cannot be.[/angry].
[english], [french], etc.	When using a voice in one language, tags a word or phrase that should be spoken using the rules of a different language.	Voici le service [english]People Builder[/english].
[ipa]	Pronounces the enclosed word using IPA phonetic spelling.	[ipa pɪˈkɑːn]pecan[/ipa]
[silence]	Specifies a pause in the speech. You can use value in seconds (s) or in milliseconds (ms). With no argument, a pause of 1 second is used.	[silence] [silence 1.5s] [silence 500ms]
[pitch]	Sets the pitch for a word or phrase. As an argument, you can use the word default, x-low, low, medium, high, x-high, or a relative value in %.	[pitch high]high[/pitch] [pitch -5%]lower[/pitch]
[rate]	Sets the rate for a word or phrase. As an argument, you can use the word default, x-slow, slow, medium, fast, x-fast, or a value between 20% and 200%.	[rate slow]slow[/rate] [rate 20%]faster[/rate]
[sampa]	Pronounces the enclosed word using X-SAMPA phonetic spelling.	[sampa pI"kA:n]pecan[/sampa]
[spell]	Spells a word instead of speaking it normally.	The letter [spell]a[/spell].
[spoken]	Use this tag to wrap text as it should be spoken. This text will not appear in a user-facing transcript. Almost always used next to [written].	[spoken]woostershur[/spoken]
[volume]	Sets the volume for a word or phrase. As an argument, you can use the word default, silent, x-soft, soft, medium, loud, x-loud, or a value such as +ndB, -ndB.	[volume soft]quiet[/volume]
[written]	Use this tag to wrap text as it should be written. This text may appear in a user-facing transcript, but is never spoken. Almost always used next to [spoken].	[written]Worcestershire[/written]