Reputation: 133
I was able to get short dictation answers from the REST API of Bing Voice Recognition. My goal is to get responses for audio-files that are longer than 15-30 seconds (aka long dictation mode). So what I do for getting the short answers is the following (I'm developing a HTML uwp app):
ArrayBuffer
from an audio file (wav)var accessToken = [[accessTocken]];
var url = 'https://speech.platform.bing.com/recognize?';
var params = {
'version': '3.0',
'format': 'json',
'locale': 'en-US',
'device.os': 'Windows OS',
'scenarios': 'smd',
'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
'requestid': guid(),
'instanceid': guid()
};
var options = {
url: url + $.param(params),
type: "POST",
headers: {
'Authorization': 'Bearer ' + accessToken,
'Content-Type': 'audio/wav; samplerate=16000'
},
data: data
};
return WinJS.xhr(options);
So this works! But how can I do this for long dictation scenarios?
Please don't reference the JavaScript GitHub repository at https://github.com/microsoft/Cognitive-Speech-STT-Javascript. This works only for short dictation AND is not working in the Edge browser.
Upvotes: 1
Views: 845
Reputation: 730
From API documentation:
Your application must endpoint the audio to determine start and end of speech, which in turn is used by the service to determine the start and end of the request. You may not upload more than 10 seconds of audio in any one request and the total request duration cannot exceed 14 seconds.
Maybe you need to implement the Client Library to use the differents modes.
ShortPhrase mode: an utterance up to 15 seconds long. As data is sent to the server, the client will receive multiple partial results and one final multiple N-best choice result.
LongDictation mode: an utterance up to 2 minutes long. As data is sent to the server, the client will receive multiple partial results and multiple final results, based on where the server indicates sentence pauses.
Intent detection: The server returns additional structured information about the speech input. To use Intent you will need to first train a model. See details here.
Upvotes: 1