Reputation: 365
import datetime
import shlex
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
strWeatherFile = "data/weather.dat"
signal = []
with open(strWeatherFile, 'rt') as f:
for line in f:
arrDat = shlex.split(line)
d = float(arrDat[3])
if d < -30: # if it's dummy data
d = signal[len(signal) - 1]
signal.append(d)
size = len(signal)
time = np.linspace(1,size,size)
f_signal = rfft(signal)
import pylab as plt
plt.subplot(221)
plt.plot(time,signal)
plt.subplot(222)
plt.plot(time,f_signal)
plt.xlim(0,size)
plt.show()
This is weather data, so it is supposed to show spike at freq:365. But the result doesn't like what expected. Is there anything wrong with the code?
data is from below link: http://academic.udayton.edu/kissock/http/Weather/gsod95-current/CALOSANG.txt
Upvotes: 1
Views: 850
Reputation: 949
The data file contains 14 bad samples with values of -99. I replaced the bad samples with linearly interpolated values using the nearest good samples. The total number of samples in the file is 7006. The data is for an unknown daily weather parameter.
As Ben said, the frequency units are 1/Days not Days, so you will not see a peak at 365 frequency units (assuming the data is periodic with period 365 days).
The graph below shows the data input to the FFT. The bad samples have been removed using the linear interpolation procedure. The zero values at the right end are added automatically by the FFT, in order to pad the number of input samples to the next higher power-of-2.
The graph below shows the data input to the FFT, with the DC offset (62.3127) removed.
The graph below shows the FFT output at full scale. The FFT input includes the DC offset.
The graph below zooms into the low frequency end of the FFT output. A peak is visible near the left side of the graph. That peak corresponds to the frequency you were looking for. The FFT input includes the DC offset.
The graph below shows the frequency peak of interest. The peak is at 0.0026855 (22/8192) frequency units, which corresponds to a time value of 372 days (1/0.0026855). The FFT input includes the DC offset.
The graph below shows the FFT after the input data was windowed with the Blackman-Harris 92 dB high resolution window. The frequency peak of interest is revealed 18 dB above the surrounding background. Windowing in this case markedly reduces spectral leakage. After windowing the peak remains at 0.0026855 (22/8192) frequency units, which corresponds to a time value of 372 days (1/0.0026855). The FFT input includes the DC offset.
The graph below shows the FFT after removing the DC offset (62.3127) from the input data, and no windowing applied to the input data.
The graph below shows the FFT after removing the DC offset (62.3127) from the input data, and applying the Blackman-Harris 92 dB high resolution window.
FFT and graphs were done with the Sooeet FFT calculator
Upvotes: 2
Reputation: 395743
I don't necessarily have a good answer here, but I wanted to make your code easy to replicate for others, so they don't have to fiddle with a manual download and files:
import datetime
import urllib2
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
url = 'http://academic.udayton.edu/kissock/http/Weather/gsod95-current/CALOSANG.txt'
response = urllib2.urlopen(url)
html = response.read()
rows = [[float(c) if '.' in c else int(c)
for c in row.split()]
for row in html.splitlines()]
signal = []
for line in rows:
d = float(line[3])
if d < -30: # if it's dummy data
d = signal[len(signal) - 1]
signal.append(d)
size = len(signal)
time = np.linspace(1,size,size)
f_signal = rfft(signal)
import pylab as plt
plt.subplot(221)
plt.plot(time,signal)
plt.subplot(222)
plt.plot(time,f_signal)
plt.xlim(0,size)
plt.show()
Upvotes: 0
Reputation: 7124
Your FFT should be plotted against frequency, not time.
import shlex
import pylab as plt
import numpy as np
from scipy.fftpack import rfft, rfftfreq
strWeatherFile = "data/weather.dat"
signal = []
with open(strWeatherFile, 'rt') as f:
for line in f:
arrDat = shlex.split(line)
d = float(arrDat[3])
if d < -30: # if it's dummy data
d = signal[len(signal) - 1]
signal.append(d)
size = len(signal)
time = np.arange(1,size+1)
f_signal = rfft(signal)
freq = rfftfreq(size, d=1.)
plt.ion()
plt.close("all")
plt.figure()
plt.plot(time,signal)
plt.figure()
plt.plot(freq,abs(f_signal))
plt.show()
You wouldn't expect to see a spike at 365 because that's the time value. The corresponding frequency to a signal with a periodicity of 365 would be 1/365 or .00274
Upvotes: 2