Reputation: 13
I'm trying to find the frequency spectrum of people speaking in a wav file but before that, I figured I'd try doing this with just a simple 200hz audio file. In the following code, I read in the 200hz file and plot it on the screen. Note: The 200hz file has a sample rate of 192000. My chunk size is 1/10th that, so every 19200 samples
from scipy.io import wavfile
import numpy as np
### This is just for drawing
import matplotlib.pyplot as plt
import matplotlib.animation as animation
### Above is for drawing
# Read the .wav file
sample_rate, data = wavfile.read('200hz.wav')
CHUNK_SAMPLES_PER_SECOND = 10
CHUNK = sample_rate / CHUNK_SAMPLES_PER_SECOND
# Now compute the spectrum on a given frame
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
# Now, lets just draw the plot
for frame in range(len(data) / CHUNK):
ax1.clear()
frame_data = data[frame * CHUNK:(frame + 1) * CHUNK, 0] # normally 2 channel, take 1st channel
frame_data = frame_data * 1.0 / frame_data.max()
#### Below, activate those to use the FFT ####
# frame_data = np.fft.fft(frame_data) # Calculate FFT on dataset
# frame_data = frame_data * 1.0 / frame_data.max() # Normalize FFT data
# ax1.set_xlabel('frequency')
ax1.plot(np.abs(frame_data), '-')
ax1.set_xlabel('sample')
ax1.set_ylabel('volume')
plt.pause(1.0 / CHUNK_SAMPLES_PER_SECOND)
The above code produces:
To me, this looks correct. Since I am only taking 19200 samples of a 192000 sample rate, the plot should be for 0.1 seconds. Thus, a 200 Hz signal should have roughly 20 full waves.
When I then go to enable the following code by uncommenting:
#### Below, activate those to use the FFT ####
# frame_data = np.fft.fft(frame_data) # Calculate FFT on dataset
# frame_data = frame_data * 1.0 / frame_data.max() # Normalize FFT data
# ax1.set_xlabel('frequency')
It produces a funky looking fft chart:
I guess what I expected it to show was a peak at around ~200hz or well at least one well defined peak for the frequency of the signal. Thanks!
Edit: I added the actual audio file I was using here.
I also adjusted the Y axis to log scale and the x axis range below:
Upvotes: 1
Views: 244
Reputation: 60504
Your frequency axis goes from 0 to 19200. This is incorrect, by taking a smaller chunk you are not reducing your sampling frequency. It should go from 0 to 192000.
So imagine each value along this axis is multiplied by 10. You thus have a peak at 200 Hz, as expected, and a bunch of peaks at integer multiples of that, as expected. Note that your sample is not a perfect sinusoid, it has a lot of harmonics due to the shape.
Note also that the second large peak close to 192000 Hz corresponds to the "negative frequency": the second half of the output of the DFT is redundant, a mirrored copy of the first half.
Upvotes: 2