Reputation: 93
i'm trying to use Azure Cognitive Services Speech to Text and i am hitting a roadblock in .net Core
i have native support for a WAV file using the audioConfig.FromWafFileInput(); which is great.
however i need to also support MP3's
I have found compressed audio support https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-csharp
however this is referencing PushAudio Streams.
this is where i'm getting lost....
i have found this example for stream codec compressed audio https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/linux/compressed-audio-input/compressed-audio-input.cpp
however this is not C# .net core and conversion is not really my strong suit.
so yeah at a bit of a loss.
any assistance would be greatly appreciated (y)
Upvotes: 1
Views: 1154
Reputation: 188
If you have files, especially if you have multiple of them, you can benefit from using batch transcription. It natively supports files in WAV, MP3 and OGG format.
The documentation links to the API documentation, that also includes model customization. Here you can select the region you are interested in and export a swagger file. The swagger file you can use to generate a client in the programming language of your choice.
For your scenario you will only need 4 APIs and you could use the standard HttpClient to execute the requests. You would want to
Upvotes: 0
Reputation: 36
This sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs has compressed audio specific methods here and here. The latter pull stream sample seems pretty straightforward, just plug in your key, region, and filepath.
Upvotes: 1