Spectrogram in python using numpy

Question

I need to make spectrogram using numpy. I take 1s of audio and split it into 0.02s chunks. Then I calculate FFT using numpy and put it back together into one image. Results are poor.

Here is spectrogram generated using matplotlib specgram function:

And here is my 'spectrogram':

Here is my code:

spect_frags = []
transform = []

for x in range(0, 8000, 160):
  spect_frags.append(spect_sample[x:x + 160])

for sample in spect_frags:
  transform.append(abs(np.fft.fft(sample).real)[0:np.fft.fft(sample).real.size//4])

I cut out 3/4 of frequencies beacuse I don't need them for now. I have no clue why there is so much diffrence in resolution. How can I improve it?

jlandercy · Accepted Answer

Spectrogram MCVE

You can recreate a crude estimate of specgram with the following code:

import numpy as np
from scipy.io import wavfile
from scipy import fft
import matplotlib.pyplot as plt

# Read some sample file (replace with your data):
rate, data = wavfile.read('./data/aaaah.wav')
# rate=48000, data.shape=(46447, 2) ~ almost 1s of stereo signal

# Spectrogram estimation:
N = 256
S = []
for k in range(0, data.shape[0]+1, N):
    x = fft.fftshift(fft.fft(data[k:k+N,0], n=N))[N//2:N]
    # assert np.allclose(np.imag(x*np.conj(x)), 0)
    Pxx = 10*np.log10(np.real(x*np.conj(x)))
    S.append(Pxx)
S = np.array(S)

# Frequencies:
f = fft.fftshift(fft.fftfreq(N, d=1/rate))[N//2:N]
# array([    0. ,   187.5,   375. , ..., 23625. , 23812.5])

# Spectrogram rendering:
plt.imshow(S.T, origin='lower')

It outputs:

When specgram renders:

_ = plt.specgram(data[:,0])

This MCVE differs from specgram because axes should be scaled to properly reflect time and frequencies and there is no moving windowing. More precisely:

x-axis stands for time chunck index of length N=256;
y-axis is positive half plane FFT index (N//2=128), notice the use of fftshift to assemble the spectrum after fft;
real frequencies are available using sampling rate and fftfreq, in specgram it ranges from 0 to 1 as this method is not necessarily aware of the signal sampling rate;
there is no window overlap (independent contiguous chuncks are used), this is why the MCVE is a bit less smooth than specgram.

Power estimate

Also notice than taking real part of complex number is not the same as taking the magnitude. Mainly, when you write:

abs(np.fft.fft(sample).real)

You are not taking the norm of complex number, but you totally remove the complex part because of the .real call.

You should estimate the power using product of conjugates:

10*np.log10(np.real(x*np.conj(x)))

Then use abs to transform complex type (or just keep the real part as the complex part must be null) into float. Finally, you can scale in Decibel using decimal logarithm.

Sanity check

You can check the result of FFT is indeed of complex type with a significative complex part (removing it leads to information loss):

x
# array([-1.56000000e+02-0.00000000e+00j, -3.94271344e+01+1.17935735e+02j,
#         4.03754070e+01+4.14695163e+01j,  1.71510716e+01+1.26920718e+01j,
#         2.15523795e+01-2.07362424e+00j, -3.03847433e+00-1.22767815e+01j,
#        -4.56347533e+00-7.36380957e-01j, -1.28048283e+01-6.80931256e+00j,
#        -2.22781473e+01+1.12096897e+01j, -1.13788549e+01+2.54314337e+01j,
#        ...])

And the product of conjugates do have a null complex part (but is still of complex type):

x*np.conj(x)
# array([2.43360000e+04+0.j, 1.54633365e+04+0.j, 3.34989427e+03+0.j,
#        4.55247945e+02+0.j, 4.68804979e+02+0.j, 1.59951690e+02+0.j,
#        2.13675640e+01+0.j, 2.10330365e+02+0.j, 6.21972990e+02+0.j,
#        7.76236159e+02+0.j, 1.05846430e+03+0.j, 6.54663598e+02+0.j,
#        6.95792718e+01+0.j, 6.03013130e+01+0.j, 1.11620428e+01+0.j,
#        ...])

You can ensure this is always true (sanity check), by asserting the following:

assert np.allclose(np.imag(x*np.conj(x)), 0)

Spectrogram in python using numpy

Answers (2)

Spectrogram MCVE

Power estimate

Sanity check

Related Questions