Logic1
Logic1

Reputation: 1847

Python Numpy FFT -or- RFFT to find period of a wave instead of its frequiency?

I'm new to signal analysis and thought I would take on a project to try to learn Python's FFT module by attempting to analyze the stability of the air temperature in one of our labs.

I wrote this python script that has some real data from our sensor. And I'll explain some of the initial variables here:

"data" Is the data taken from the database. Normally they could be assumed to be in 120 second intervals however that is not guaranteed. so to help calculate a quick average sample rate I added:

"temporal_window" Which is the time in seconds from the first to the last measurement. So where:

T = temporal_window/N #should equal roughly 120 seconds

"debug" In normal operation the data is fed to the FFT via the array built from the database (aka "data"), but as I was trying to understand how the FFT worked I decided to make a "diagnostics_array" which is just an array with the same number of data points as the array from the database but has a sine wave where the given wavelength is in seconds.

import numpy as np
import numpy.fft as fft
import matplotlib.pyplot as plt

data = np.array([17.38 , 17.66 , 18.26 , 18.62 , 18.98 , 19.42 , 19.7 , 19.38 , 18.46 , 17.82 , 17.5 , 17.3 , 17.9 , 18.3 , 18.66 , 19.06 , 19.5 , 19.78 , 19.94 , 19.06 , 18.06 , 17.54 , 17.26 , 18.02 , 18.42 , 18.78 , 19.18 , 19.54 , 19.82 , 19.42 , 18.54 , 17.74 , 17.34 , 17.18 , 17.86 , 18.38 , 18.7 , 19.02 , 19.42 , 19.7 , 19.42 , 18.38 , 17.74 , 17.34 , 17.66 , 18.22 , 18.46 , 18.82 , 19.26 , 19.62 , 19.78 , 18.78 , 17.98 , 17.46 , 17.3 , 17.98 , 18.38 , 18.74 , 19.06 , 19.42 , 19.74 , 19.98 , 19.54 , 18.46 , 17.82 , 17.26 , 17.7 , 18.3 , 18.62 , 18.98 , 19.42 , 19.74 , 19.9 , 19.1 , 18.14 , 17.74 , 17.98 , 18.38 , 18.74 , 19.1 , 19.54 , 19.82 , 19.38 , 18.54 , 17.9 , 17.58 , 18.14 , 18.58 , 18.9 , 19.3 , 19.62 , 19.9 , 19.54 , 18.54 , 17.82 , 17.38 , 17.74 , 18.3 , 18.7 , 19.1 , 19.42 , 19.66 , 18.78 , 17.94 , 17.42 , 17.22 , 17.94 , 18.38 , 18.82 , 19.18 , 19.58 , 19.82 , 19.94 , 19.02 , 18.22 , 17.66 , 17.46 , 18.1 , 18.46 , 18.86 , 19.18 , 19.58 , 19.9 , 19.46 , 18.5 , 17.82 , 17.38 , 17.66 , 18.26 , 18.66 , 19.02 , 19.46 , 19.78 , 19.94 , 19.06 , 19.18 , 19.58 , 19.94 , 20.22 , 20.38 , 20.54 , 20.58 , 20.06 , 18.94 , 18.14 , 17.74 , 17.34 , 17.7 , 18.3 , 18.7 , 19.02 , 19.42 , 19.74 , 19.9 , 19.02 , 18.22 , 17.66 , 17.3 , 17.7 , 18.3 , 18.7 , 18.98 , 19.38 , 19.74 , 19.42 , 18.5 , 17.74 , 17.26 , 17.66 , 18.3 , 18.62 , 19.02 , 19.42 , 19.74 , 19.94 , 18.98 , 18.22 , 17.78 , 17.58 , 18.14 , 18.5 , 18.86 , 19.18 , 19.58 , 19.78 , 18.86 , 18.02 , 17.58 , 17.34 , 18.02 , 18.38 , 18.78 , 19.14 , 19.58 , 19.82 , 19.5 , 18.5 , 17.86 , 17.46 , 17.74 , 18.3 , 18.62 , 19.06 , 19.42 , 19.74 , 18.86 , 17.98 , 17.54 , 17.18 , 17.98 , 18.38 , 18.74 , 19.1 , 19.54 , 19.86 , 19.46 , 18.46 , 17.9 , 17.3 , 17.66 , 18.22 , 18.66 , 18.94 , 19.42 , 19.78 , 19.42 , 18.46 , 17.82 , 18.02 , 18.5 , 18.86 , 19.26 , 19.62 , 19.34 , 18.42 , 17.86 , 18.02 , 18.46 , 18.78 , 19.26 , 19.58 , 19.34 , 18.3 , 17.7 , 17.42 , 18.1 , 18.5 , 18.78 , 19.22 , 19.62 , 19.74 , 18.78 , 17.98 , 17.42 , 17.14 , 17.42 , 18.02 , 18.42 , 18.74 , 19.14 , 19.5 , 19])
temporal_window = 42014.0 #seconds

N = len(data) #datapoints
T = temporal_window/N #should equal roughly 120 seconds

###Diagnostic Override###
debug = True #DEBUG SWITCH
if debug:
    wave_lenght = 60*60*1 #in seconds (eg. 60*60*2 = 2 hours)
    print "Created a sine wave with %s second period" % wave_lenght
    diagnostic_array = np.arange(0,1,1./N)
    diagnostic_array = np.cos(2*np.pi*temporal_window/wave_lenght*diagnostic_array)
    data = diagnostic_array
#########################

a=np.abs(fft.rfft(data))
a[0]=0 #Not sure if this is a good idea but seems to help with choppy data..
xt = np.linspace(0.0, temporal_window, a.size)

print "Peak found at %s second period" % int(xt[np.argmax(a)])

plt.subplot(211)
plt.plot(xt,a)
plt.subplot(212)
plt.plot(np.linspace(0,temporal_window,data.size),data)
plt.show()

so when running the code from above I get the following print statements:

>>> #1 hour period
Created a sine wave with 3600 second period
Peak found at 3848 second period

show the FFT of a sinewave with a one hour period over 42014 seconds

>>> #2 hour period
Created a sine wave with 7200 second period
Peak found at 1924 second period

show the FFT of a sine wave with a two hour period over 42014 seconds

so the result of the FFT's peak value seems to get smaller as the wavelengths get longer (totally expected). But what I am unsure about is how to change it so that in this example the peak match the wavelength in seconds. Is it possible with FFT? I was reading about IFFT to convert back to the time domain but without a good understanding of the subject I'm at a bit of a loss..

Any ideas or thoughts on how to accomplish that would be greatly appreciated!! And if I have not explained my intentions clearly please let me know and I'll be happy to add details. Many thanks!!

Upvotes: 1

Views: 10601

Answers (1)

Logic1
Logic1

Reputation: 1847

Thanks to hobbs for the little push I re-evaluated what I was actually looking at.

After a little further research I found the rfftfreq function to be quite handy as opposed to linspace.

So heres the updated code that seems to work as expected. As a note I get "RuntimeWarning: divide by zero" where I do np.divide(60,freqs). This however doesn't seem to effect the result.

I did notice that with the current diagnostic part of the script it allows for leakage in the FFT because its not concerned with fitting whole waves to the data set (eg. could be 1.3 waves long or whatever).

So to really see this in action (where the peak FFT matches the input waveform period) all you have to do is change this line:

-from-

wave_lenght = 60*60*1 #in seconds (eg. 60*60*2 = 2 hours)

-to-

whole_waves = 2
wave_lenght = temporal_window/whole_waves #fits n number of whole waves within the dataset

This makes the wave a function of the total time as opposed to a set wavelength and thus fits it nicely within the dataset.

Heres the complete updated script. If anyone spots errors please comment (I'm still learning this stuff and love the community's feedback)!

import numpy as np
import numpy.fft as fft
import matplotlib.pyplot as plt

data = np.array([17.38 , 17.66 , 18.26 , 18.62 , 18.98 , 19.42 , 19.7 , 19.38 , 18.46 , 17.82 , 17.5 , 17.3 , 17.9 , 18.3 , 18.66 , 19.06 , 19.5 , 19.78 , 19.94 , 19.06 , 18.06 , 17.54 , 17.26 , 18.02 , 18.42 , 18.78 , 19.18 , 19.54 , 19.82 , 19.42 , 18.54 , 17.74 , 17.34 , 17.18 , 17.86 , 18.38 , 18.7 , 19.02 , 19.42 , 19.7 , 19.42 , 18.38 , 17.74 , 17.34 , 17.66 , 18.22 , 18.46 , 18.82 , 19.26 , 19.62 , 19.78 , 18.78 , 17.98 , 17.46 , 17.3 , 17.98 , 18.38 , 18.74 , 19.06 , 19.42 , 19.74 , 19.98 , 19.54 , 18.46 , 17.82 , 17.26 , 17.7 , 18.3 , 18.62 , 18.98 , 19.42 , 19.74 , 19.9 , 19.1 , 18.14 , 17.74 , 17.98 , 18.38 , 18.74 , 19.1 , 19.54 , 19.82 , 19.38 , 18.54 , 17.9 , 17.58 , 18.14 , 18.58 , 18.9 , 19.3 , 19.62 , 19.9 , 19.54 , 18.54 , 17.82 , 17.38 , 17.74 , 18.3 , 18.7 , 19.1 , 19.42 , 19.66 , 18.78 , 17.94 , 17.42 , 17.22 , 17.94 , 18.38 , 18.82 , 19.18 , 19.58 , 19.82 , 19.94 , 19.02 , 18.22 , 17.66 , 17.46 , 18.1 , 18.46 , 18.86 , 19.18 , 19.58 , 19.9 , 19.46 , 18.5 , 17.82 , 17.38 , 17.66 , 18.26 , 18.66 , 19.02 , 19.46 , 19.78 , 19.94 , 19.06 , 19.18 , 19.58 , 19.94 , 20.22 , 20.38 , 20.54 , 20.58 , 20.06 , 18.94 , 18.14 , 17.74 , 17.34 , 17.7 , 18.3 , 18.7 , 19.02 , 19.42 , 19.74 , 19.9 , 19.02 , 18.22 , 17.66 , 17.3 , 17.7 , 18.3 , 18.7 , 18.98 , 19.38 , 19.74 , 19.42 , 18.5 , 17.74 , 17.26 , 17.66 , 18.3 , 18.62 , 19.02 , 19.42 , 19.74 , 19.94 , 18.98 , 18.22 , 17.78 , 17.58 , 18.14 , 18.5 , 18.86 , 19.18 , 19.58 , 19.78 , 18.86 , 18.02 , 17.58 , 17.34 , 18.02 , 18.38 , 18.78 , 19.14 , 19.58 , 19.82 , 19.5 , 18.5 , 17.86 , 17.46 , 17.74 , 18.3 , 18.62 , 19.06 , 19.42 , 19.74 , 18.86 , 17.98 , 17.54 , 17.18 , 17.98 , 18.38 , 18.74 , 19.1 , 19.54 , 19.86 , 19.46 , 18.46 , 17.9 , 17.3 , 17.66 , 18.22 , 18.66 , 18.94 , 19.42 , 19.78 , 19.42 , 18.46 , 17.82 , 18.02 , 18.5 , 18.86 , 19.26 , 19.62 , 19.34 , 18.42 , 17.86 , 18.02 , 18.46 , 18.78 , 19.26 , 19.58 , 19.34 , 18.3 , 17.7 , 17.42 , 18.1 , 18.5 , 18.78 , 19.22 , 19.62 , 19.74 , 18.78 , 17.98 , 17.42 , 17.14 , 17.42 , 18.02 , 18.42 , 18.74 , 19.14 , 19.5 , 19])
temporal_window = 42014.0 #seconds

N = len(data) #datapoints
T = 60/(temporal_window/N) #Sample rate average (readings/minute)

###Diagnostic Override###
debug = False #DEBUG SWITCH
if debug:
    wave_lenght = 800 #in seconds (eg. 60*60*2 = 2 hours)
    print "Created a sine wave with %s second period" % wave_lenght
    diagnostic_array = np.arange(0,1,1./N)
    diagnostic_array = np.cos(2*np.pi*temporal_window/wave_lenght*diagnostic_array)
    data = diagnostic_array
#########################
    
a=np.abs(fft.rfft(data, n=data.size))
a[0]=0 #Not sure if this is a good idea but seems to help with choppy data..
freqs = fft.rfftfreq(data.size, d=1./T)
freqs = np.divide(60,freqs)

max_freq = freqs[np.argmax(a)]
print "Peak found at %s second period (%s minutes)" % (max_freq, max_freq/60)

plt.subplot(211)
plt.plot(freqs,a)
plt.subplot(212)
plt.plot(np.linspace(0,temporal_window,data.size),data)
plt.show()

Running above code produces this print statement:

>>>#Using data from database    
Peak found at 1710.49363868 second period (28.5082273113 minutes)

enter image description here

UPDATE

I rewrote the diagnostic script to further test the reliability of this code. It allows you to create sets of waves that are superimposed, but also gives you some options as to how you want the wave to be represented.

>>>#standing_wave_list = [4,8,9,21,88]
Added a sine wave with 10503.5 second period
Added a sine wave with 5251.75 second period
Added a sine wave with 4668.22222222 second period
Added a sine wave with 2000.66666667 second period
Added a sine wave with 477.431818182 second period
Peak found at 5251.75 second period (87.5291666667 minutes)

Composit wave

if you want to try it yourself its as easy as cut and past into the previous code:

###Diagnostic Override###
debug = True #DEBUG SWITCH
if debug:
    def build_waveform(wave_set, by_period=False):
        #superimposed sine waves (integers create perfet standing waves)
        wave_date = np.zeros(data.size)
        for wave in wave_set:
            if by_period:
                wave_lenght = wave #creates a wave with period n seconds
            else:
                wave_lenght = temporal_window/wave #fits n number of whole waves within the dataset
            new_wave = np.arange(0,1,1./N)
            new_wave = np.cos(2*np.pi*temporal_window/wave_lenght*new_wave)
            wave_date += new_wave
            print "Added a sine wave with %s second period" % wave_lenght
        return wave_date

    option = 2
    if option == 1:
        #####BUILD A SET OF WAVES BY PERIOD IN SECONDS#####
        period_wave_list = [60*60*1,
                     60*30,
                     60*25]
        data = build_waveform(period_wave_list, by_period=True)
        #########
    elif option == 2:    
        #####BUILD A SET OF PERFECT STANDING WAVES#####
        standing_wave_list = [4,8,9,21,88]
        data = build_waveform(standing_wave_list)
        #########
#########################

FINAL UPDATE I PROMISE!

I found it was necessary to display the FFT as a bar chart instead of a line chart for visual clarity. I also fixed the divide by zero error (had to slice the arrays upon their creation using "[1:]" syntax). So I'll add the code here but I'm removing the diagnostics and data stuff (you can copy and past from the previous code). Anyways I think this looks MUCH clearer:

>>>#standing_wave_list = [4,8,9,21,88]
Added a sine wave with 10503.5 second period
Added a sine wave with 5251.75 second period
Added a sine wave with 4668.22222222 second period
Added a sine wave with 2000.66666667 second period
Added a sine wave with 477.431818182 second period
Peak found at 5251.75 second period (87.5291666667 minutes)

enter image description here

import numpy as np
import numpy.fft as fft
import matplotlib.pyplot as plt

#data = np.array(just copy and past from previous code)
temporal_window = 42014.0 #seconds

N = len(data) #datapoints
T = 60/(temporal_window/N) #Cycles per minute

###Diagnostic Override###
    #REMOVED
#########################
    
a=np.abs(fft.rfft(data, n=data.size))[1:]
freqs = fft.rfftfreq(data.size, d=1./T)[1:]
freqs = np.divide(60,freqs)

max_freq = freqs[np.argmax(a)]
print "Peak found at %s second period (%s minutes)" % (max_freq, max_freq/60)
plt.subplot(211,axisbg='black')
plt.bar(freqs,a,edgecolor="gray",linewidth=2)
plt.plot(freqs,a, 'r--')
plt.grid(b=True, color='w')

plt.subplot(212,axisbg='black')
plt.plot(np.linspace(0,temporal_window,data.size),data,'r')
plt.grid(b=True,axis="y", color='w')
plt.show()

Upvotes: 2

Related Questions