Reputation: 25
I'm trying to extract data from an wav file for audio analysis of each frequency and their amplitude with respect to time, my aim to run this data for a machine learning algorithm for a college project, after a bit of googling I found out that this can be done by Python's matplotlib library, I saw some sample codes that ran a Short Fourier transform and plotted a spectrogram of these wav files but wasn't able to understand how to use this library to extract data (all frequency's amplitude at a given time in the audio file) and store it in an 3D array or a .mat file. Here's the code I saw on some website:
#!/usr/bin/env python
""" This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Frank Zalkow, 2012-2013 """
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))
frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win
return np.fft.rfft(frames)
""" scale frequency axis logarithmically """
def logscale_spec(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))
# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
if i == len(scale)-1:
newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1)
else:
newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1)
# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
if i == len(scale)-1:
freqs += [np.mean(allfreqs[scale[i]:])]
else:
freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])]
return newspec, freqs
""" plot spectrogram"""
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
timebins, freqbins = np.shape(ims)
plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
plt.colorbar()
plt.xlabel("time (s)")
plt.ylabel("frequency (hz)")
plt.xlim([0, timebins-1])
plt.ylim([0, freqbins])
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])
if plotpath:
plt.savefig(plotpath, bbox_inches="tight")
else:
plt.show()
plt.clf()
plotstft("abc.wav")
Please guide me to understand how to extract the data, if not by matplotlib, recommend me some other library which will help me achieve this.
Upvotes: 0
Views: 3498
Reputation: 3930
First of all, this looks like my code which is stated to be under a CC license. I don't take it too serious, but you should not ignore those aspects (you omitted the statement of authorship in this case), others could be more miffed about such a thing.
To your question: In this code the stft isn't computed by matplotlib, but just by numpy. You can get it like this:
samplerate, samples = wav.read(audiopath)
s = stft(samples, 1024)
I am not sure why you want a 3D array? It is a 2D-array, but it is complex valued. If you want to save it in a .mat file:
from scipy.io import savemat
savemat("file.mat", {'arr': s})
Upvotes: 1
Reputation: 28285
You can see once the wav audio file is read into variable samples it is passed to a function called stft :
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
here you already have access to the audio samples in var samples in the form of integers ... be aware that bit depth will impact number of bytes per sample as represented as a series of integers ... also know your endianness (left to right or visa versa) ... however in function stft that array is further processed into an array of floats in variable : frames before its passed into function np.fft.rfft
Depending on your needs those are your access choices without doing any of your own processing
Upvotes: 0