Reputation: 73
I've got a little problem managing FFT data. I was looking for many examples of how to do FFT, but I couldn't get what I want from any of them. I have a random wave file with 44kHz sample rate and I want to get magnitude of N harmonics each X ms, let's say 100ms should be enough. I tried this code:
import scipy.io.wavfile as wavfile
import numpy as np
import pylab as pl
rate, data = wavfile.read("sound.wav")
t = np.arange(len(data[:,0]))*1.0/rate
p = 20*np.log10(np.abs(np.fft.rfft(data[:2048, 0])))
f = np.linspace(0, rate/2.0, len(p))
pl.plot(f, p)
pl.xlabel("Frequency(Hz)")
pl.ylabel("Power(dB)")
pl.show()
This was last example I used, I found it somewhere on stackoverflow. The problem is, this gets magnitude which I want, gets frequency, but no time at all. FFT analysis is 3D as far as I know and this is "merged" result of all harmonics. I get this:
X-axis = Frequency, Y-axis = Magnitude, Z-axis = Time (invisible)
From my understanding of the code, t is time - and it seems like that, but is not needed in the code - We'll maybe need it though. p is array of powers (or magnitude), but it seems like some average of all magnitudes of each frequency f, which is array of frequencies. I don't want average/merged value, I want magnitude for N harmonics each X milliseconds.
Long story short, we can get: 1 magnitude of all frequencies.
We want: All magnitudes of N freqeuencies including time when certain magnitude is present.
Result should look like this array: [time,frequency,amplitude] So in the end if we want 3 harmonics, it would look like:
[0,100,2.85489] #100Hz harmonic has 2.85489 amplitude on 0ms
[0,200,1.15695] #200Hz ...
[0,300,3.12215]
[100,100,1.22248] #100Hz harmonic has 1.22248 amplitude on 100ms
[100,200,1.58758]
[100,300,2.57578]
[200,100,5.16574]
[200,200,3.15267]
[200,300,0.89987]
Visualization is not needed, result should be just arrays (or hashes/dictionaries) as listed above.
Upvotes: 3
Views: 12482
Reputation: 73
Edit: Oh, so it seems this returns values, but they don't fit to the audio file at all. Even though they can be used as magnitude on spectrogram, they won't work for example in those classic audio visualizers which you can see in many music players. I also tried matplotlib's pylab for the spectrogram, but the result is same.
import os
import wave
import pylab
import math
from numpy import amax
from numpy import amin
def get_wav_info(wav_file,mi,mx):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'Int16')
frame_rate = wav.getframerate()
wav.close()
spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT=1024, Fs=frame_rate)
n = 0
while n < 20:
for index,power in enumerate(spectrum[n]):
print("%s,%s,%s" % (n,int(round(t[index]*1000)),math.ceil(power*100)/100))
n += 1
get_wav_info("wave.wav",1,20)
Any tips how to obtain dB that's usable in visualization?
Basically, we apparently have all we need from the code above, just how to make it return normal values? Ignore mi
and mx
as these are just adjusting values in array to fit into mi..mx interval - that would be for visualization usage. If I am correct, spectrum
in this code returns array of arrays which contains amplitudes for each frequency from freqs
array, which are present on time according to t
array, but how does the value work - is it really amplitude if it returns these weird values and if it is, how to convert it to dBs for example.
tl;dr I need output for visualizer like music players have, but it shouldn't work realtime, I want just the data, but values don't fit the wav file.
Edit2: I noticed there's one more issue. For 90 seconds wav, t
array contains times till 175.x, which seems very weird considering the frame_rate
is correct with the wav file. So now we have 2 problems: spectrum
doesn't seem to return correct values (maybe it will fit if we get correct time) and t
seems to return exactly double time of the wav.
Fixed: Case completely solved.
import os
import pylab
import math
from numpy import amax
from numpy import amin
from scipy.io import wavfile
frame_rate, snd = wavfile.read(wav_file)
sound_info = snd[:,0]
spectrum, freqs, t, im = pylab.specgram(sound_info,NFFT=1024,Fs=frame_rate,noverlap=5,mode='magnitude')
Specgram needed a little adjustment and I loaded only one channel with scipy.io library (instead of wave library). Also without mode set to magnitude, it returns 10log10 instead of 20log10, which is reason why it didn't return correct values.
Upvotes: 0
Reputation: 31050
Further to @Paul R's answer, scipy.signal.spectrogram
is a spectrogram function in scipy's signal processing module.
The example at the above link is as follows:
from scipy import signal
import matplotlib.pyplot as plt
# Generate a test signal, a 2 Vrms sine wave whose frequency linearly
# changes with time from 1kHz to 2kHz, corrupted by 0.001 V**2/Hz of
# white noise sampled at 10 kHz.
fs = 10e3
N = 1e5
amp = 2 * np.sqrt(2)
noise_power = 0.001 * fs / 2
time = np.arange(N) / fs
freq = np.linspace(1e3, 2e3, N)
x = amp * np.sin(2*np.pi*freq*time)
x += np.random.normal(scale=np.sqrt(noise_power), size=time.shape)
#Compute and plot the spectrogram.
f, t, Sxx = signal.spectrogram(x, fs)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
Upvotes: 6
Reputation: 212979
It looks like you're trying to implement a spectrogram, which is a sequence of power spectrum estimates, typically implemented with a succession of (usually overlapping) FFTs. Since you only have one FFT (spectrum) then you have no time dimension yet. Put your FFT code in a loop, and process one block of samples (e.g. 1024) per iteration, with a 50% overlap between successive blocks. The sequence of generated spectra will then be a 3D array of time v frequency v magnitude.
I'm not a Python person, but I can give you some pseudo code which should be enough to get you coding:
N = length of data input
N_FFT = no of samples per block (== FFT size, e.g. 1024)
i = 0 ;; i = index of spectrum within 3D output array
for block_start = 0 to N - block_start
block_end = block_start + N_FFT
get samples from block_start .. block_end
apply window function to block (e.g. Hamming)
apply FFT to windowed block
calculate magnitude spectrum (20 * log10( re*re + im*im ))
store spectrum in output array at index i
block_start += N_FFT / 2 ;; NB: 50% overlap
i++
end
Upvotes: 4