Mayur
Mayur

Reputation: 3

Speaker Identification In Azure Speech Translator Service

I'm Trying to identify the speakers like Speaker-1, Speaker-2..and so on from the audio Input(File/Microphone) in any language using Azure Speech Translator service. Audio Input may be in any language but I'm translating it into English using service. I can get result of translator service but speaker's were not tagged.

In code unable to get e.Result.UserID for speaker Identification.

Please help to get the solution for the problem.

Thank you.

Code sample I'm trying is as below

recognizer.Recognized += (s, e) =>
                {
                    if (e.Result.Reason == ResultReason.TranslatedSpeech)
                    {
                         
                        foreach (var element in e.Result.Translations)
                        {
                            Console.WriteLine($"TRANSLATING into '{element.Key}': {element.Value}");
                                                   

                        }
                    } 

Need the Output as per the attached screenshot - Speech Recognition

Upvotes: 0

Views: 346

Answers (1)

Sampath
Sampath

Reputation: 3523

While there's no direct "UserID" property in the Azure Speech Translator service for C#, here are the relevant concepts and approaches to achieve speaker identification with ResultId :

  • Since TranslationRecognitionEventArgs class has identifier ResultId,Reason and Text.
    static string speechKey = "AzureSpeechServiceKEY";
    static string speechRegion = "AzureSpeechServiceRegion";

    static void OutputSpeechRecognitionResult(TranslationRecognitionResult translationRecognitionResult)
    {
        // Handle different recognition results
        switch (translationRecognitionResult.Reason)
        {
            case ResultReason.TranslatedSpeech:
                
                Console.WriteLine($"RESULT ID: {translationRecognitionResult.ResultId}");
                Console.WriteLine($"RECOGNIZED: Text={translationRecognitionResult.Text}");
             
                foreach (var element in translationRecognitionResult.Translations)
                {
                    Console.WriteLine($"TRANSLATED into '{element.Key}': {element.Value}");
                }
                break;
            case ResultReason.NoMatch:
               
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                break;
            case ResultReason.Canceled:
               
                var cancellation = CancellationDetails.FromResult(translationRecognitionResult);
                Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                }
                break;
        }
    }

    async static Task Main(string[] args)
    {
        // Set up SpeechTranslationConfig with subscription key and region
        var speechTranslationConfig = SpeechTranslationConfig.FromSubscription(speechKey, speechRegion);
        speechTranslationConfig.SpeechRecognitionLanguage = "en-US";
        speechTranslationConfig.AddTargetLanguage("it");

        // Set up audio configuration from default microphone input
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();

        // Initialize TranslationRecognizer
        using var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, audioConfig);

        // Start recognition and output the result
        Console.WriteLine("Speak into your microphone.");
        var translationRecognitionResult = await translationRecognizer.RecognizeOnceAsync();
        OutputSpeechRecognitionResult(translationRecognitionResult);
    }




enter image description here

    static void OutputSpeechRecognitionResult(object sender, TranslationRecognitionEventArgs e)
    {
      
        
        Console.WriteLine($"ResultId: {e.Result.ResultId}");

        // Handle different recognition results
        switch (e.Result.Reason)
        {
            case ResultReason.TranslatedSpeech:
                // Output recognized and translated text along with result ID
                Console.WriteLine($"RESULT ID: {e.Result.ResultId}");
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                foreach (var element in e.Result.Translations)
                {
                    Console.WriteLine($"TRANSLATED into '{element.Key}': {element.Value}");
                }
                break;
            case ResultReason.NoMatch:
                // Handle case where speech could not be recognized
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                break;
            case ResultReason.Canceled:
                // Handle cancellation reasons, e.g., errors
                var cancellation = CancellationDetails.FromResult(e.Result);
                Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                }
                break;
        }
    }

    async static Task Main(string[] args)
    {
        // Set up SpeechTranslationConfig with subscription key and region
        var speechTranslationConfig = SpeechTranslationConfig.FromSubscription(speechKey, speechRegion);
        speechTranslationConfig.SpeechRecognitionLanguage = "en-US";
        speechTranslationConfig.AddTargetLanguage("it");

        // Set up audio configuration from default microphone input
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();

        // Initialize TranslationRecognizer
        using var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, audioConfig);

        // Register the event handler
        translationRecognizer.Recognizing += OutputSpeechRecognitionResult;

        // Start recognition
        Console.WriteLine("Speak into your microphone.");
        await translationRecognizer.StartContinuousRecognitionAsync();

        // Keep the application running to continue capturing speech
        Console.ReadLine();

        // Stop recognition
        await translationRecognizer.StopContinuousRecognitionAsync();
    }
}

enter image description here

To get element.value :

    static string speechKey = "AzureSpeechServiceKEY";
    static string speechRegion = "AzureSpeechServiceRegion";

    static void OutputSpeechRecognitionResult(object sender, TranslationRecognitionEventArgs e)
    {
        Console.WriteLine($"ResultId: {e.Result.ResultId}");

    
            // Output recognized text
            Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
            foreach (var element in e.Result.Translations)
        {
            Console.WriteLine($"TRANSLATED into '{element.Key}': {element.Value}");
        }
        // Handle different recognition results
        switch (e.Result.Reason)
        {
            case ResultReason.TranslatedSpeech:
                // Output recognized and translated text along with result ID
                Console.WriteLine($"RESULT ID: {e.Result.ResultId}");
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");

                // Extract speaker information from the Translations property
                foreach (var element in e.Result.Translations)
                {
                    // Assume the source language is English ("en-US")
                    string sourceLanguage = "en-US";
                    Console.WriteLine($"TRANSLATED into '{element.Key}': {element.Value}");

                    // Extract speaker information if the source language is English
                    if (element.Key == sourceLanguage)
                    {
                        string speakerTag = $"Speaker 1- {element.Value}";
                        Console.WriteLine($"SPEAKER TAG: {speakerTag}");
                    }
                }
                break;
            case ResultReason.NoMatch:
                // Handle case where speech could not be recognized
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                break;
            case ResultReason.Canceled:
                // Handle cancellation reasons, e.g., errors
                var cancellation = CancellationDetails.FromResult(e.Result);
                Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                }
                break;
        }
    }

    async static Task Main(string[] args)
    {
        // Set up SpeechTranslationConfig with subscription key and region
        var speechTranslationConfig = SpeechTranslationConfig.FromSubscription(speechKey, speechRegion);
        speechTranslationConfig.SpeechRecognitionLanguage = "en-US";
        speechTranslationConfig.AddTargetLanguage("it");

        // Set up audio configuration from default microphone input
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();

        // Initialize TranslationRecognizer
        using var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, audioConfig);

        // Register the event handler
        translationRecognizer.Recognizing += OutputSpeechRecognitionResult;

        // Start recognition
        Console.WriteLine("Speak into your microphone.");
        await translationRecognizer.StartContinuousRecognitionAsync();

        // Keep the application running to continue capturing speech
        Console.ReadLine();

        // Stop recognition
        await translationRecognizer.StopContinuousRecognitionAsync();
    }
}


enter image description here

  • Speakers are identified as Guest-1, Guest-2, and so on, depending on the number of speakers in the conversation. Code taken from DOC and audio file from git.

enter image description here

Upvotes: 0

Related Questions