codebyjakob
codebyjakob

Reputation: 116

How to match text to audio in Python?

I have an audio file and a text that corresponds to the speech in this audio file.

Is there any way to match the text to the audio so that I get something like timestamps that show where the words in the text file appear in the audio.

Upvotes: 3

Views: 4236

Answers (2)

codebyjakob
codebyjakob

Reputation: 116

So I have found exactly what I was looking for.

Apparently the technology that matches a given Text to an Audio and returns the exact timestamps is called Forced Alignment.

Here is an extremely useful link to a list of the best forced alignment tools: https://github.com/pettarin/forced-alignment-tools

Personally, I have used Aeneas as it worked really well for me.

Upvotes: 6

Saurabh Pandey
Saurabh Pandey

Reputation: 549

Yes, that is possible. I am assuming you are aware of basic terminology around the audio tech.

Check library https://www.geeksforgeeks.org/python-speech-recognition-on-large-audio-files/

The library can read any audio file chunk by chunk. One could pass the file for audio to text conversion and further can collect the result of text chunk by chunk.

Also, If the SampleRate of the Audio File is 44100, then 8192 chunks will represent a time unit around 185 milliseconds.

Upvotes: 1

Related Questions