Vrishabh Lakhani
Vrishabh Lakhani

Reputation: 25

How to extract data from a wav file using python matplotlib library?

I'm trying to extract data from an wav file for audio analysis of each frequency and their amplitude with respect to time, my aim to run this data for a machine learning algorithm for a college project, after a bit of googling I found out that this can be done by Python's matplotlib library, I saw some sample codes that ran a Short Fourier transform and plotted a spectrogram of these wav files but wasn't able to understand how to use this library to extract data (all frequency's amplitude at a given time in the audio file) and store it in an 3D array or a .mat file. Here's the code I saw on some website:

#!/usr/bin/env python

""" This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Frank Zalkow, 2012-2013 """

import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks

""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
    win = window(frameSize)
    hopSize = int(frameSize - np.floor(overlapFac * frameSize))

    # zeros at beginning (thus center of 1st window should be for sample nr. 0)
    samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)    
    # cols for windowing
    cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
    # zeros at end (thus samples can be fully covered by frames)
    samples = np.append(samples, np.zeros(frameSize))

    frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
    frames *= win


    return np.fft.rfft(frames)    

""" scale frequency axis logarithmically """    
def logscale_spec(spec, sr=44100, factor=20.):
    timebins, freqbins = np.shape(spec)

    scale = np.linspace(0, 1, freqbins) ** factor
    scale *= (freqbins-1)/max(scale)
    scale = np.unique(np.round(scale))

    # create spectrogram with new freq bins
    newspec = np.complex128(np.zeros([timebins, len(scale)]))
    for i in range(0, len(scale)):
        if i == len(scale)-1:
            newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1)
        else:        
            newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1)

    # list center freq of bins
    allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
    freqs = []
    for i in range(0, len(scale)):
        if i == len(scale)-1:
            freqs += [np.mean(allfreqs[scale[i]:])]
        else:
            freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])]

    return newspec, freqs

""" plot spectrogram"""
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):
    samplerate, samples = wav.read(audiopath)
    s = stft(samples, binsize)

    sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
    ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel

    timebins, freqbins = np.shape(ims)

    plt.figure(figsize=(15, 7.5))
    plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
    plt.colorbar()

    plt.xlabel("time (s)")
    plt.ylabel("frequency (hz)")
    plt.xlim([0, timebins-1])
    plt.ylim([0, freqbins])

    xlocs = np.float32(np.linspace(0, timebins-1, 5))
    plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
    ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
    plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])

    if plotpath:
        plt.savefig(plotpath, bbox_inches="tight")
    else:
        plt.show()

    plt.clf()
plotstft("abc.wav")

Please guide me to understand how to extract the data, if not by matplotlib, recommend me some other library which will help me achieve this.

Upvotes: 0

Views: 3498

Answers (2)

Frank Zalkow
Frank Zalkow

Reputation: 3930

First of all, this looks like my code which is stated to be under a CC license. I don't take it too serious, but you should not ignore those aspects (you omitted the statement of authorship in this case), others could be more miffed about such a thing.

To your question: In this code the stft isn't computed by matplotlib, but just by numpy. You can get it like this:

samplerate, samples = wav.read(audiopath)
s = stft(samples, 1024)

I am not sure why you want a 3D array? It is a 2D-array, but it is complex valued. If you want to save it in a .mat file:

from scipy.io import savemat
savemat("file.mat", {'arr': s})

Upvotes: 1

Scott Stensland
Scott Stensland

Reputation: 28285

You can see once the wav audio file is read into variable samples it is passed to a function called stft :

samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)

here you already have access to the audio samples in var samples in the form of integers ... be aware that bit depth will impact number of bytes per sample as represented as a series of integers ... also know your endianness (left to right or visa versa) ... however in function stft that array is further processed into an array of floats in variable : frames before its passed into function np.fft.rfft

Depending on your needs those are your access choices without doing any of your own processing

Upvotes: 0

Related Questions