Reputation: 115
Say I'm creating a youtube-type application, and want to create auto-generated captions. I have the video .mp4
file, and I want to generate a .vtt
file for that. Is there anyway to do that with just the SpeechRecognition
API and VTTCue
s? Like somehow I get the audio data from the mp4, and run that through the speech recognition api and it generates a transcript?
So far what I've seen is that the SpeechRecognition API can only transcript live microphone output. But is there a way to make it run through audio data?
If this helps, I'm using react
in my frontend and node
in my backend.
Upvotes: 0
Views: 582
Reputation: 918
Not sure about accessing the Speech API directly, but with the speech SDK you can send binary audio data directly to the recognizer.
Have a look at
and
All you have to do is to create your audioConfig like
var audioConfig = AudioConfig.FromWavFileInput("PathToFile.wav");
or
var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
instead of
var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
If the problem is with reading an mp4 - nAudio should be able to do that: https://github.com/naudio/NAudio
Upvotes: 1