Reputation: 565
I need to make spectrogram using numpy. I take 1s of audio and split it into 0.02s chunks. Then I calculate FFT using numpy and put it back together into one image. Results are poor.
Here is spectrogram generated using matplotlib specgram function:
And here is my 'spectrogram':
Here is my code:
spect_frags = []
transform = []
for x in range(0, 8000, 160):
spect_frags.append(spect_sample[x:x + 160])
for sample in spect_frags:
transform.append(abs(np.fft.fft(sample).real)[0:np.fft.fft(sample).real.size//4])
I cut out 3/4 of frequencies beacuse I don't need them for now. I have no clue why there is so much diffrence in resolution. How can I improve it?
Upvotes: 2
Views: 14244
Reputation: 41
For efficient use with JAX I found it useful to adapt @jlandercy's solution to avoid explicit loops and to add some simple Hann windowing. I gave up support for stereo input in the process, although I'm fairly certain that the whole point of computing STFT is that the nonlinearity of Fourier transforms means I'm destroying some information by collapsing stereo signals to mono before carrying this out.
I also did not extend their answer to include overlapping windows in the interest of preserving its simplicity and concision. There's probably some numpy sorceress out there who knows how to do efficient np.concatenate
to stitch together or interpolate slices of nonoverlapping signal samples from the innermost dimensions of wins
or some such. Regrettably, I myself have no such knowledge of the dark arts, so that functionality has been elided.
import matplotlib.pyplot as plt
import jax.numpy as jnp
def stft(a, n_fft=128, window=jnp.hanning):
n = n_fft
rpad = n - a.shape[-1] % n
wins = jnp.pad(a, (0, rpad)).reshape(-1, n) * window(n)
fftc = jnp.fft.fftshift(jnp.fft.fft(wins, n=n))[..., n // 2 : n]
fftr = jnp.real(fftc * jnp.conj(fftc))
return fftr
audio = # buffer single-channel floating-point samples from somewhere...
_ = plt.imshow(stft(audio, 1024).T[:, -512:], cmap="viridis")
Upvotes: 2
Reputation: 11072
You can recreate a crude estimate of specgram
with the following code:
import numpy as np
from scipy.io import wavfile
from scipy import fft
import matplotlib.pyplot as plt
# Read some sample file (replace with your data):
rate, data = wavfile.read('./data/aaaah.wav')
# rate=48000, data.shape=(46447, 2) ~ almost 1s of stereo signal
# Spectrogram estimation:
N = 256
S = []
for k in range(0, data.shape[0]+1, N):
x = fft.fftshift(fft.fft(data[k:k+N,0], n=N))[N//2:N]
# assert np.allclose(np.imag(x*np.conj(x)), 0)
Pxx = 10*np.log10(np.real(x*np.conj(x)))
S.append(Pxx)
S = np.array(S)
# Frequencies:
f = fft.fftshift(fft.fftfreq(N, d=1/rate))[N//2:N]
# array([ 0. , 187.5, 375. , ..., 23625. , 23812.5])
# Spectrogram rendering:
plt.imshow(S.T, origin='lower')
It outputs:
When specgram
renders:
_ = plt.specgram(data[:,0])
This MCVE differs from specgram
because axes should be scaled to properly reflect time and frequencies and there is no moving windowing. More precisely:
N=256
;N//2=128
), notice the use of fftshift
to assemble the spectrum after fft
;fftfreq
, in specgram
it ranges from 0 to 1 as this method is not necessarily aware of the signal sampling rate;specgram
.Also notice than taking real part of complex number is not the same as taking the magnitude. Mainly, when you write:
abs(np.fft.fft(sample).real)
You are not taking the norm of complex number, but you totally remove the complex part because of the .real
call.
You should estimate the power using product of conjugates:
10*np.log10(np.real(x*np.conj(x)))
Then use abs
to transform complex
type (or just keep the real
part as the complex part must be null) into float
. Finally, you can scale in Decibel using decimal logarithm.
You can check the result of FFT is indeed of complex type with a significative complex part (removing it leads to information loss):
x
# array([-1.56000000e+02-0.00000000e+00j, -3.94271344e+01+1.17935735e+02j,
# 4.03754070e+01+4.14695163e+01j, 1.71510716e+01+1.26920718e+01j,
# 2.15523795e+01-2.07362424e+00j, -3.03847433e+00-1.22767815e+01j,
# -4.56347533e+00-7.36380957e-01j, -1.28048283e+01-6.80931256e+00j,
# -2.22781473e+01+1.12096897e+01j, -1.13788549e+01+2.54314337e+01j,
# ...])
And the product of conjugates do have a null complex part (but is still of complex
type):
x*np.conj(x)
# array([2.43360000e+04+0.j, 1.54633365e+04+0.j, 3.34989427e+03+0.j,
# 4.55247945e+02+0.j, 4.68804979e+02+0.j, 1.59951690e+02+0.j,
# 2.13675640e+01+0.j, 2.10330365e+02+0.j, 6.21972990e+02+0.j,
# 7.76236159e+02+0.j, 1.05846430e+03+0.j, 6.54663598e+02+0.j,
# 6.95792718e+01+0.j, 6.03013130e+01+0.j, 1.11620428e+01+0.j,
# ...])
You can ensure this is always true (sanity check), by asserting the following:
assert np.allclose(np.imag(x*np.conj(x)), 0)
Upvotes: 5