Reputation: 145

How to compare two or more audio files and obtain timecodes where the audio is different?

I am working with Python. Where I have two audio files of which one is original and one is edited (half original and half inserted audio).

I have used libraries like inaSpeechSegmenter and Speech Recognition. Using these I am able to find whether the audio differs in music, speech or no audio. Also I am able to find the time where the audio differs at first place, using inaSpeechSegmenter. However I'm unable to find timecode when the audio differs at more than one place.

Also I didn't find any API which can help me to resolve the problem.

I would appreciate some ideas and suggestions for this, thanks.

Upvotes: 3

Answers (1)

maarten

Reputation: 492

I'll discuss the case where two audio files are made up of regions that are either identical sample-by-sample, or they are different (e.g. inserted audio). For identifying the mismatching regions you don't need any advanced signal processing.

First you need to load the audio files into python. If your audiofiles are '.wav' files you can use the python builtin wave module. If you also need to deal with other types of audio files (ogg, flac) a good option is soundfile, which you can install through pip (note that it doesn't support mp3 files).

import soundfile
import numpy as np

signal_1, samplerate_1 = soundfile.read("audiofile_1.wav")
signal_2, samplerate_2 = soundfile.read("audiofile_2.wav")

Let's assume that samplerate_1 == samplerate_2 and that len(signal_1) == len(signal_2). You can locate the samplewise differences like this:

mismatch = (signal_1 != signal_2).astype(np.int)

This is an array the same size as your signals, and it has value 1 at the positions where the signals differ, and 0 elsewhere. Now if you are interested in the regions where the signals differ, you can find the positions where mismatch goes from 0 to 1 (the start of a mismatching region) and from 1 back to 0 (the end of a mismatching region), using the np.diff and the np.where functions:

region_starts = np.where(np.diff(np.r_[0, mismatch, 0]) == 1)[0]
region_ends = np.where(np.diff(np.r_[0, mismatch, 0]) == -1)[0]

For the start/end positions to be correct the mismatch is padded with a leading and a trailing 0 (using np.r_[]). Now you can pair the start/end times of each region, and divide by the samplerate to get the timestamps in seconds:

mismatching_regions = np.column_stack((region_starts, region_ends))
mismatching_regions /= samplerate_1

Upvotes: 5

How to compare two or more audio files and obtain timecodes where the audio is different?

Answers (1)

Related Questions