ymliu
ymliu

Reputation: 365

FFT in python cannot plot correct frequence

import datetime
import shlex

import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq

strWeatherFile = "data/weather.dat"

signal = []

with open(strWeatherFile, 'rt') as f:
    for line in f:
        arrDat = shlex.split(line)
        d = float(arrDat[3])
        if d < -30:  # if it's dummy data
            d = signal[len(signal) - 1]
        signal.append(d)

size = len(signal)

time   = np.linspace(1,size,size)

f_signal = rfft(signal)


import pylab as plt
plt.subplot(221)
plt.plot(time,signal)
plt.subplot(222)
plt.plot(time,f_signal)
plt.xlim(0,size)

plt.show()

This is weather data, so it is supposed to show spike at freq:365. But the result doesn't like what expected. Is there anything wrong with the code?

data is from below link: http://academic.udayton.edu/kissock/http/Weather/gsod95-current/CALOSANG.txt

Upvotes: 1

Views: 850

Answers (3)

Babson
Babson

Reputation: 949

The data file contains 14 bad samples with values of -99. I replaced the bad samples with linearly interpolated values using the nearest good samples. The total number of samples in the file is 7006. The data is for an unknown daily weather parameter.

As Ben said, the frequency units are 1/Days not Days, so you will not see a peak at 365 frequency units (assuming the data is periodic with period 365 days).

The graph below shows the data input to the FFT. The bad samples have been removed using the linear interpolation procedure. The zero values at the right end are added automatically by the FFT, in order to pad the number of input samples to the next higher power-of-2.

Time Domain graph - sooeet.com FFT calculator

The graph below shows the data input to the FFT, with the DC offset (62.3127) removed.

Time Domain graph, DC offset removed - sooeet.com FFT calculator

The graph below shows the FFT output at full scale. The FFT input includes the DC offset.

FFT spectrum graph - sooeet.com FFT calculator

The graph below zooms into the low frequency end of the FFT output. A peak is visible near the left side of the graph. That peak corresponds to the frequency you were looking for. The FFT input includes the DC offset.

FFT spectrum graph zoomed - sooeet.com FFT calculator

The graph below shows the frequency peak of interest. The peak is at 0.0026855 (22/8192) frequency units, which corresponds to a time value of 372 days (1/0.0026855). The FFT input includes the DC offset.

FFT spectrum peak - sooeet.com FFT calculator

The graph below shows the FFT after the input data was windowed with the Blackman-Harris 92 dB high resolution window. The frequency peak of interest is revealed 18 dB above the surrounding background. Windowing in this case markedly reduces spectral leakage. After windowing the peak remains at 0.0026855 (22/8192) frequency units, which corresponds to a time value of 372 days (1/0.0026855). The FFT input includes the DC offset.

FFT window function applied - spectrum peak - sooeet.com FFT calculator

The graph below shows the FFT after removing the DC offset (62.3127) from the input data, and no windowing applied to the input data.

FFT spectrum, DC offset removed - sooeet.com FFT calculator

The graph below shows the FFT after removing the DC offset (62.3127) from the input data, and applying the Blackman-Harris 92 dB high resolution window.

FFT spectrum, DC offset removed, window function applied - sooeet.com FFT calculator

FFT and graphs were done with the Sooeet FFT calculator

Upvotes: 2

Aaron Hall
Aaron Hall

Reputation: 395743

I don't necessarily have a good answer here, but I wanted to make your code easy to replicate for others, so they don't have to fiddle with a manual download and files:

import datetime
import urllib2
import numpy as np    
from scipy.fftpack import rfft, irfft, fftfreq    

url = 'http://academic.udayton.edu/kissock/http/Weather/gsod95-current/CALOSANG.txt'
response = urllib2.urlopen(url)
html = response.read()
rows = [[float(c) if '.' in c else int(c)
         for c in row.split()] 
         for row in html.splitlines()]

signal = []
for line in rows:
    d = float(line[3])
    if d < -30:  # if it's dummy data
        d = signal[len(signal) - 1]
    signal.append(d)

size = len(signal)
time   = np.linspace(1,size,size)
f_signal = rfft(signal)
import pylab as plt
plt.subplot(221)
plt.plot(time,signal)
plt.subplot(222)
plt.plot(time,f_signal)
plt.xlim(0,size)
plt.show()

Upvotes: 0

Ben
Ben

Reputation: 7124

Your FFT should be plotted against frequency, not time.

import shlex
import pylab as plt
import numpy as np
from scipy.fftpack import rfft, rfftfreq

strWeatherFile = "data/weather.dat"

signal = []

with open(strWeatherFile, 'rt') as f:
    for line in f:
        arrDat = shlex.split(line)
        d = float(arrDat[3])
        if d < -30:  # if it's dummy data
            d = signal[len(signal) - 1]
        signal.append(d)

size = len(signal)

time = np.arange(1,size+1)

f_signal = rfft(signal)
freq = rfftfreq(size, d=1.)


plt.ion()
plt.close("all")
plt.figure()
plt.plot(time,signal)
plt.figure()
plt.plot(freq,abs(f_signal))
plt.show()

You wouldn't expect to see a spike at 365 because that's the time value. The corresponding frequency to a signal with a periodicity of 365 would be 1/365 or .00274

Upvotes: 2

Related Questions