SFSpeechRecognizer from Local Video

Question

I'm trying to implement speech transcription (voice to text) from a video. My approach is breaking this down into 3 steps:

Convert video to audio file (m4a/mp3)
Pass audio to SFSpeechRecognizer request with audio file url
Prase results

My issue is that I haven't found a way to convert the source video file (let's say .mov) into an audio only file. The AVAsset itself of the video, doesn't have any audio tracks, but still has audio when playing the file (so it does exist).

I imagine if I can solve step 1, then 2 + 3 are trivial, so my question is - what is the best way to convert a video file into an audio only file, which I can then use for transcription.

SFSpeechRecognizer from Local Video

Answers (1)

Related Questions