Dan Miller
Dan Miller

Reputation: 13

Can't generate Numpy FFT properly

I'm trying to find the frequency spectrum of people speaking in a wav file but before that, I figured I'd try doing this with just a simple 200hz audio file. In the following code, I read in the 200hz file and plot it on the screen. Note: The 200hz file has a sample rate of 192000. My chunk size is 1/10th that, so every 19200 samples

from scipy.io import wavfile
import numpy as np

### This is just for drawing
import matplotlib.pyplot as plt
import matplotlib.animation as animation

### Above is for drawing

# Read the .wav file
sample_rate, data = wavfile.read('200hz.wav')
CHUNK_SAMPLES_PER_SECOND = 10
CHUNK = sample_rate / CHUNK_SAMPLES_PER_SECOND

# Now compute the spectrum on a given frame
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)

# Now, lets just draw the plot
for frame in range(len(data) / CHUNK):
    ax1.clear()
    frame_data = data[frame * CHUNK:(frame + 1) * CHUNK, 0] # normally 2 channel, take 1st channel
    frame_data = frame_data * 1.0 / frame_data.max()

    #### Below, activate those to use the FFT ####
    # frame_data = np.fft.fft(frame_data) # Calculate FFT on dataset
    # frame_data = frame_data * 1.0 / frame_data.max() # Normalize FFT data
    # ax1.set_xlabel('frequency')

    ax1.plot(np.abs(frame_data), '-')
    ax1.set_xlabel('sample')
    ax1.set_ylabel('volume')
    plt.pause(1.0 / CHUNK_SAMPLES_PER_SECOND)

The above code produces:

Regular signal plot

To me, this looks correct. Since I am only taking 19200 samples of a 192000 sample rate, the plot should be for 0.1 seconds. Thus, a 200 Hz signal should have roughly 20 full waves.

When I then go to enable the following code by uncommenting:

#### Below, activate those to use the FFT ####
# frame_data = np.fft.fft(frame_data) # Calculate FFT on dataset
# frame_data = frame_data * 1.0 / frame_data.max() # Normalize FFT data
# ax1.set_xlabel('frequency')

It produces a funky looking fft chart:

a funky looking fft chart

I guess what I expected it to show was a peak at around ~200hz or well at least one well defined peak for the frequency of the signal. Thanks!

Edit: I added the actual audio file I was using here.

I also adjusted the Y axis to log scale and the x axis range below:

here

Upvotes: 1

Views: 244

Answers (1)

Cris Luengo
Cris Luengo

Reputation: 60504

Your frequency axis goes from 0 to 19200. This is incorrect, by taking a smaller chunk you are not reducing your sampling frequency. It should go from 0 to 192000.

So imagine each value along this axis is multiplied by 10. You thus have a peak at 200 Hz, as expected, and a bunch of peaks at integer multiples of that, as expected. Note that your sample is not a perfect sinusoid, it has a lot of harmonics due to the shape.

Note also that the second large peak close to 192000 Hz corresponds to the "negative frequency": the second half of the output of the DFT is redundant, a mirrored copy of the first half.

Upvotes: 2

Related Questions