Azure Speech to Text Translations with multiple languages

Question

I'm fairly new to Azure's speech sdk so it's quite possible I'm missing something obvious so apologies if that's the case.

I've been working on a project where I want to translate an audio file/stream from one language to another. It works decently when they entire conversation is in one language (all Spanish) but it falls apart when I feed it a real conversations where there's English and Spanish. It tries to recognize the english words AS spanish words (so it'll transcribe something like 'I'm sorry' as mangled spanish).

From what I can tell, you can set multiple target languages (language to translated into) but only one speechRecognitionLanguage. That seems to imply that it can't handle conversations where there's multiple languages (like a phone call with a translator) or if speakers flip between languages. Is there a way to make it work with multiple languages or is that just something Microsoft hasn't quite gotten around to yet?

Here's the code I have right now (it's just a lightly modified version of the example on their github):

// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");

(function() {
"use strict";

    module.exports = {
    main: function(settings, audioStream) {

        // now create the audio-config pointing to our stream and
        // the speech config specifying the language.
        var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
        var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);

        // setting the recognition language.
        translationConfig.speechRecognitionLanguage = settings.language;

        // target language (to be translated to).
        translationConfig.addTargetLanguage("en");

        // create the translation recognizer.
        var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);

        recognizer.recognized = function (s, e) {
            if (e.result.reason === sdk.ResultReason.NoMatch) {
                var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
                console.log("
Didn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
            } else {
                var str = "
Next Line: " + e.result.text + "
Translations:";

                var language = "en";
                str += " [" + language + "] " + e.result.translations.get(language);
                str += "
";

                console.log(str);
            }
        };

        //two possible states, Error or EndOfStream
        recognizer.canceled = function (s, e) {
            var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
            //if it was because of an error
            if (e.reason === sdk.CancellationReason.Error) {
                str += ": " + e.errorDetails;
                console.log(str);
            }
            //We've reached the end of the file, stop the recognizer
            else {
                recognizer.stopContinuousRecognitionAsync(function() {
                console.log("End of file.");

                recognizer.close();
                recognizer = undefined;
                },
                function(err) {
                console.trace("err - " + err);
                recognizer.close();
                recognizer = undefined;
                })
            }
        };


        // start the recognizer and wait for a result.
        recognizer.startContinuousRecognitionAsync(
            function () {
                console.log("Starting speech recognition");
            },
            function (err) {
                console.trace("err - " + err);

                recognizer.close();
                recognizer = undefined;
            }
        );
    }

    }
}());

wolfma · Accepted Answer

As of now (August) Speech SDK translation supports translation from one input language into multiple output languages.

There are services in development that support recognition of the spoken language. These will enable us to run translation from multiple input languages into multiple output languages (both set of languages you would specify in the config). There is no ETA for the availability yet ...

Wolfgang

Azure Speech to Text Translations with multiple languages

Answers (2)

Related Questions