Reputation: 985
I'm fairly new to Azure's speech sdk so it's quite possible I'm missing something obvious so apologies if that's the case.
I've been working on a project where I want to translate an audio file/stream from one language to another. It works decently when they entire conversation is in one language (all Spanish) but it falls apart when I feed it a real conversations where there's English and Spanish. It tries to recognize the english words AS spanish words (so it'll transcribe something like 'I'm sorry' as mangled spanish).
From what I can tell, you can set multiple target languages (language to translated into) but only one speechRecognitionLanguage. That seems to imply that it can't handle conversations where there's multiple languages (like a phone call with a translator) or if speakers flip between languages. Is there a way to make it work with multiple languages or is that just something Microsoft hasn't quite gotten around to yet?
Here's the code I have right now (it's just a lightly modified version of the example on their github):
// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
(function() {
"use strict";
module.exports = {
main: function(settings, audioStream) {
// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);
// setting the recognition language.
translationConfig.speechRecognitionLanguage = settings.language;
// target language (to be translated to).
translationConfig.addTargetLanguage("en");
// create the translation recognizer.
var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);
recognizer.recognized = function (s, e) {
if (e.result.reason === sdk.ResultReason.NoMatch) {
var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
console.log("\r\nDidn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
} else {
var str = "\r\nNext Line: " + e.result.text + "\nTranslations:";
var language = "en";
str += " [" + language + "] " + e.result.translations.get(language);
str += "\r\n";
console.log(str);
}
};
//two possible states, Error or EndOfStream
recognizer.canceled = function (s, e) {
var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
//if it was because of an error
if (e.reason === sdk.CancellationReason.Error) {
str += ": " + e.errorDetails;
console.log(str);
}
//We've reached the end of the file, stop the recognizer
else {
recognizer.stopContinuousRecognitionAsync(function() {
console.log("End of file.");
recognizer.close();
recognizer = undefined;
},
function(err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
})
}
};
// start the recognizer and wait for a result.
recognizer.startContinuousRecognitionAsync(
function () {
console.log("Starting speech recognition");
},
function (err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
}
);
}
}
}());
Upvotes: 1
Views: 3805
Reputation: 436
As of now (August) Speech SDK translation supports translation from one input language into multiple output languages.
There are services in development that support recognition of the spoken language. These will enable us to run translation from multiple input languages into multiple output languages (both set of languages you would specify in the config). There is no ETA for the availability yet ...
Wolfgang
Upvotes: 0
Reputation: 24148
According to the section Speech translation
of the offical document Language and region support for the Speech Services
, as below, I think you can use Speech translation
instead of Speech-To-text
to realize your needs.
Speech translation
The Speech Translation API supports different languages for speech-to-speech and speech-to-text translation. The source language must always be from the Speech-to-Text language table. The available target languages depend on whether the translation target is speech or text. You may translate incoming speech into more than 60 languages. A subset of these languages are available for speech synthesis.
Meanwhile, there is the offical sample code Azure-Samples/cognitive-services-speech-sdk/samples/js/node/translation.js
for Speech translation
.
I do not speak in Spanish, so I can not help to test an audio in English and Spanish for you.
Hope it helps.
Upvotes: 0