kRazzy R
kRazzy R

Reputation: 1589

Incorrect audio file length in plot and improperly overlaid Annotation segments on audioplot in python

I am following this tutorial(https://github.com/amsehili/audio-segmentation-by-classification-tutorial/blob/master/multiclass_audio_segmentation.ipynb) and trying to recreate the visualisation outputs using my own training data and samples.

My audio file that is 31 seconds long
:https://www.dropbox.com/s/qae2u5dnnp678my/test_hold.wav?dl=0
The annotation files are here:
https://www.dropbox.com/s/gm9uu1rjettm3qr/hold.lst?dl=0
https://www.dropbox.com/s/b6z1gt8i63c8ted/tring.lst?dl=0
I am trying to plot the audio file waveform in python and then highlight the sections of "hold" and "tring" in that audio from the annotation files on top of that waveform.

The waveform from audacity is as follows: enter image description here

The code is as follows :

import wave
import pickle
import numpy as np
from sklearn.mixture import GMM
import librosa

import warnings
warnings.filterwarnings('ignore')
SAMPLING_RATE =16000
wfp = wave.open("/home/vivek/Music/test_hold.wav")
audio_data = wfp.readframes(-1)
width = wfp.getsampwidth()
wfp.close()

# data as numpy array will be used to plot signal
fmt = {1: np.int8 , 2: np.int16, 4: np.int32}
signal = np.array(np.frombuffer(audio_data, dtype=fmt[width]), dtype=np.float64)


%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
pylab.rcParams['figure.figsize'] = 24, 18

def plot_signal_and_segmentation(signal, sampling_rate, segments=[]):
    _time = np.arange(0., np.ceil(float(len(signal))) / sampling_rate, 1./sampling_rate )
    if len(_time) > len(signal):
        _time = _time[: len(signal) - len(_time)]

    pylab.subplot(211)

    for seg in segments:

        fc = seg.get("fc", "g")
        ec = seg.get("ec", "b")
        lw = seg.get("lw", 2)
        alpha = seg.get("alpha", 0.4)

        ts = seg["timestamps"]

        # plot first segmentation outside loop to show one single legend for this class
        p = pylab.axvspan(ts[0][0], ts[0][1], fc=fc, ec=ec, lw=lw, alpha=alpha, label = seg.get("title", ""))

        for start, end in ts[1:]:
            p = pylab.axvspan(start, end, fc=fc, ec=ec, lw=lw, alpha=alpha)


    pylab.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
            borderaxespad=0., fontsize=22, ncol=2)

    pylab.plot(_time, signal)

    pylab.xlabel("Time (s)", fontsize=22)
    pylab.ylabel("Signal Amplitude", fontsize=22)
    pylab.show()

annotations = {}


ts = [line.rstrip("\r\n\t ").split(" ") for line in  open("/home/vivek/Music/hold.lst").readlines()]
ts = [(float(t[0]), float(t[1])) for t in ts]
annotations["hold"] = {"fc" : "y", "ec" : "y", "lw" : 0, "alpha" : 0.4, "title" : "Hold", "timestamps" : ts}

ts = [line.rstrip("\r\n\t ").split(" ") for line in  open("/home/vivek/Music/tring.lst").readlines()]
ts = [(float(t[0]), float(t[1])) for t in ts]
annotations["tring"] = {"fc" : "r", "ec" : "r", "lw" : 0, "alpha" : 0.9, "title" : "Tring", "timestamps" : ts}


def plot_annot():
    plot_signal_and_segmentation(signal, SAMPLING_RATE,
                             [annotations["tring"],
                              annotations["hold"]])

plot_annot()  

Plot generated by above code is : enter image description here

As you can see the plot seems to think that the file is 90 seconds long when in fact it is only 31 seconds long. Also the annotation segments are wrongly overlaid/highlighted.

What am I doing wrong and how do I fix it?

PS : in the waveform , the rectangular block is the "tring" the rest of the four "trapezoidal" waveforms are the regions of hold music.

Upvotes: 1

Views: 259

Answers (1)

jaket
jaket

Reputation: 9341

Just a wild guess here. The audacity screenshot shows a sample rate of 44100. Your code snippet has a SAMPLE_RATE variable initialized to 16000. If you take your original 31 seconds and multiply it by the ratio between the two rates, 31*44100/16000 = 85.44 seconds.

Upvotes: 2

Related Questions