CODEWITHSUNDEEP

c#azureaudiospeech-recognitionmp3

Reputation: 93

.net Core console app Azure Cognitive Services MP3

i'm trying to use Azure Cognitive Services Speech to Text and i am hitting a roadblock in .net Core

i have native support for a WAV file using the audioConfig.FromWafFileInput(); which is great.

however i need to also support MP3's

I have found compressed audio support https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-csharp

however this is referencing PushAudio Streams.

this is where i'm getting lost....

i have found this example for stream codec compressed audio https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/linux/compressed-audio-input/compressed-audio-input.cpp

however this is not C# .net core and conversion is not really my strong suit.

so yeah at a bit of a loss.

any assistance would be greatly appreciated (y)

Upvotes: 1

Views: 1154

Answers (2)

Reputation: 188

If you have files, especially if you have multiple of them, you can benefit from using batch transcription. It natively supports files in WAV, MP3 and OGG format.

The documentation links to the API documentation, that also includes model customization. Here you can select the region you are interested in and export a swagger file. The swagger file you can use to generate a client in the programming language of your choice.

For your scenario you will only need 4 APIs and you could use the standard HttpClient to execute the requests. You would want to

Create a batch transcription.
Get your transcriptions to check the state. If it is complete, you get the URL you will need next. If it is failed, you get a message about the problem.
Get the results after the batch transcription succeeded. The object with the kind TranscriptionReport contains a list of files that got transcribed, if the transcription was successful and if not, why. The other objects contain the result of the successful transcriptions.
(here you need to iterate over the contentUrls, to download the files.)
Delete the transcription(s), after you got the results.

Upvotes: 0

Reputation: 36

This sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs has compressed audio specific methods here and here. The latter pull stream sample seems pretty straightforward, just plug in your key, region, and filepath.

Upvotes: 1

Related Questions