Is there a way to autogenerate a caption file in html using the Speech Recognition API?

Question

Say I'm creating a youtube-type application, and want to create auto-generated captions. I have the video .mp4 file, and I want to generate a .vtt file for that. Is there anyway to do that with just the SpeechRecognition API and VTTCues? Like somehow I get the audio data from the mp4, and run that through the speech recognition api and it generates a transcript?

So far what I've seen is that the SpeechRecognition API can only transcript live microphone output. But is there a way to make it run through audio data?

If this helps, I'm using react in my frontend and node in my backend.

Is there a way to autogenerate a caption file in html using the Speech Recognition API?

Answers (1)

Related Questions