Reputation: 145
I am working with Python. Where I have two audio files of which one is original and one is edited (half original and half inserted audio).
I have used libraries like inaSpeechSegmenter and Speech Recognition. Using these I am able to find whether the audio differs in music, speech or no audio. Also I am able to find the time where the audio differs at first place, using inaSpeechSegmenter. However I'm unable to find timecode when the audio differs at more than one place.
Also I didn't find any API which can help me to resolve the problem.
I would appreciate some ideas and suggestions for this, thanks.
Upvotes: 3
Views: 1416
Reputation: 492
I'll discuss the case where two audio files are made up of regions that are either identical sample-by-sample, or they are different (e.g. inserted audio). For identifying the mismatching regions you don't need any advanced signal processing.
First you need to load the audio files into python. If your audiofiles are '.wav' files you can use the python builtin wave
module. If you also need to deal with other types of audio files (ogg, flac) a good option is soundfile
, which you can install through pip
(note that it doesn't support mp3 files).
import soundfile
import numpy as np
signal_1, samplerate_1 = soundfile.read("audiofile_1.wav")
signal_2, samplerate_2 = soundfile.read("audiofile_2.wav")
Let's assume that samplerate_1 == samplerate_2
and that len(signal_1) == len(signal_2)
. You can locate the samplewise differences like this:
mismatch = (signal_1 != signal_2).astype(np.int)
This is an array the same size as your signals, and it has value 1 at the positions where the signals differ, and 0 elsewhere. Now if you are interested in the regions where the signals differ, you can find the positions where mismatch goes from 0 to 1 (the start of a mismatching region) and from 1 back to 0 (the end of a mismatching region), using the np.diff
and the np.where
functions:
region_starts = np.where(np.diff(np.r_[0, mismatch, 0]) == 1)[0]
region_ends = np.where(np.diff(np.r_[0, mismatch, 0]) == -1)[0]
For the start/end positions to be correct the mismatch
is padded with a leading and a trailing 0 (using np.r_[]
). Now you can pair the start/end times of each region, and divide by the samplerate to get the timestamps in seconds:
mismatching_regions = np.column_stack((region_starts, region_ends))
mismatching_regions /= samplerate_1
Upvotes: 5