Microsoft Web-Chat + SpeechSynthesizer (for text highlight or visemes)

I'm trying to implement features to a bot using Web-Chat with Speech and realtime text highlighting (or visemes).

On one hand, i have that working sample which is able to select voice, synthesize text and has some events (synthesisStarted, wordBoundary, visemeReceived, etc.) : https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/synthesis.html

and on the other hand, i have that second working sample : https://microsoft.github.io/BotFramework-WebChat/03.speech/b.cognitive-speech-services-js/

the problem i encounter is that the second sample, event if it relies also on cognitive speech services, seems to use the speech capabilities more on a "server side" than on a "client side" (for example, I had to modify the server side code to generate SSML with a "voice" markup to change the voice, I haven't found a way to do it on client side !)

expressed differently : with Web-chat, is there a way to manage speech synthesis from client side with the same "SpeechSDK.SpeechSynthesizer" class and its existing javascript events like wordBoundary or the new visemeReceived ?

JS

Additions : I also confirm that if you modify that sample (03.speech/e.select-voice) and remove/comment the lines

  selectVoice: (voices, activity) =>              
                  activity.locale === 'zh-HK'
                    ? voices.find(({ name }) => /TracyRUS/iu.test(name))
                    : voices.find(({ name }) => /JessaNeural/iu.test(name)) ||
                      voices.find(({ name }) => /Jessa/iu.test(name)),
 

the sample still work and you hear english and japanese ! (because the voices are coded in SSML received from the bot)

Upvotes: 0

Views: 517

Answers (1)

Steven Kanberg
Steven Kanberg

Reputation: 6393

Cognitive Service's Speech is baked into Web Chat as one of its bundled components. So, it's not so much as running server-side as it is just embedded in the CDN package. The reason you are able to pass in SSML marked-up activities from the bot is because the Web Chat implementation is configured to read any markup presented in an activity.

That being said, you don't have to specify the voice from within the bot. As you can see in this Web Chat sample, 03.speech/e.select-voice, you can do so from within Web Chat's renderer.

As far as 'wordBoundary' is concerned, it does not appear to be supported at this time. Looking in the SDK, in createCognitiveServicesSpeechServicesPonyfillFactory.js, there is no reference to 'wordBoundary`, nor anywhere else in the project.

If this is a feature you would like to see in Web Chat, I would suggest you open a feature request in the repo for inclusion in a future release.

Upvotes: 1

Related Questions