Lakshya Jain
Lakshya Jain

Reputation: 115

Is there a way to autogenerate a caption file in html using the Speech Recognition API?

Say I'm creating a youtube-type application, and want to create auto-generated captions. I have the video .mp4 file, and I want to generate a .vtt file for that. Is there anyway to do that with just the SpeechRecognition API and VTTCues? Like somehow I get the audio data from the mp4, and run that through the speech recognition api and it generates a transcript?

So far what I've seen is that the SpeechRecognition API can only transcript live microphone output. But is there a way to make it run through audio data?

If this helps, I'm using react in my frontend and node in my backend.

Upvotes: 0

Views: 582

Answers (1)

Frank im Wald
Frank im Wald

Reputation: 918

Not sure about accessing the Speech API directly, but with the speech SDK you can send binary audio data directly to the recognizer.

Have a look at

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-csharp#recognize-from-file

and

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-csharp#recognize-from-in-memory-stream

All you have to do is to create your audioConfig like

var audioConfig = AudioConfig.FromWavFileInput("PathToFile.wav");

or

var audioConfig = AudioConfig.FromStreamInput(audioInputStream);

instead of

var audioConfig = AudioConfig.FromDefaultMicrophoneInput();

If the problem is with reading an mp4 - nAudio should be able to do that: https://github.com/naudio/NAudio

Upvotes: 1

Related Questions