royherma
royherma

Reputation: 4203

SFSpeechRecognizer from Local Video

I'm trying to implement speech transcription (voice to text) from a video. My approach is breaking this down into 3 steps:

  1. Convert video to audio file (m4a/mp3)
  2. Pass audio to SFSpeechRecognizer request with audio file url
  3. Prase results

My issue is that I haven't found a way to convert the source video file (let's say .mov) into an audio only file. The AVAsset itself of the video, doesn't have any audio tracks, but still has audio when playing the file (so it does exist).

I imagine if I can solve step 1, then 2 + 3 are trivial, so my question is - what is the best way to convert a video file into an audio only file, which I can then use for transcription.

Upvotes: 1

Views: 896

Answers (1)

Yehor Smoliakov
Yehor Smoliakov

Reputation: 354

You can use FFmpegKit library to extract an audio part of the video.

The library example: https://github.com/tanersener/ffmpeg-kit/tree/main/apple#3-using

The ffmpeg command example to extract audio: https://stackoverflow.com/a/27413824/5707560

Upvotes: 2

Related Questions