How to interpret this fft graph

Question

I want to apply Fourier transformation using fft function to my time series data to find "patterns" by extracting the dominant frequency components in the observed data, ie. the lowest 5 dominant frequencies to predict the y value (bacteria count) at the end of each time series. I would like to preserve the smallest 5 coefficients as features, and eliminate the rest.

My code is as below:

df = pd.read_csv('/content/drive/My Drive/df.csv', sep=',') 
X = df.iloc[0:2,0:10000]

dft_X = np.fft.fft(X) 
print(dft_X) 
print(len(dft_X))
plt.plot(dft_X)
plt.grid(True)
plt.show()

# What is the graph about(freq/amplitude)? How much data did it use? 
for i in dft_X: 

    m = i[np.argpartition(i,5)[:5]]
    n = i[np.argpartition(i,range(5))[:5]]

print(m,'
',n)

Here is the output:

But I am not sure how to interpret this graph. To be precise,

1) Does the graph show the transformed values of the input data? I only used 2 rows of data(each row is a time series), thus data is 2x10000, why are there so many lines in the graph?

2) To obtain frequency value, should I use np.fft.fftfreq(n, d=timestep)?

Parameters:
n : int Window length.

d : scalar, optional Sample spacing (inverse of the sampling rate). Defaults to 1.

Returns:
f : ndarray Array of length n containing the sample frequencies.

How to determine n(window length) and sample spacing?

3) Why are transformed values all complex numbers?

Thanks

Mimakari · Accepted Answer

I'm gonna answer in reverse order of your questions

3) Why are transformed values all complex numbers?

The output of a Fourier Transform is always complex numbers. To get around this fact, you can either apply the absolute value on the output of the transform, or only plot the real part using:

plt.plot(dft_X.real)

2) To obtain frequency value, should I use np.fft.fftfreq(n, d=timestep)?

No, the "frequency values" will be visible on the output of the FFT.

1) Does the graph show the transformed values of the input data? I only used 2 rows of data(each row is a time series), thus data is 2x10000, why are there so many lines in the graph?

Your graph has so many lines because it's making a line for each column of your data set. Apply the FFT on each row separately (or possibly just transpose your dataframe) and then you'll get more actual frequency domain plots.

Follow up

Would using absolute value or real part of the output as features for a later model have different effect than using the original output?

Absolute values are easier to work with usually.

Using real part Using absolute value Here's the Octave code that generated this:

Fs = 4000;                          % Sampling rate of signal
T  = 1/Fs;                          % Period
L  = 4000;                          % Length of signal
t  = (0:L-1)*T;                     % Time axis

freq = 1000;                        % Frequency of our sinousoid

sig   = sin(freq*2*pi*t);           % Fill Time-Domain with 1000 Hz sinusoid
f_sig = fft(sig);                   % Apply FFT

f = Fs*(0:(L/2))/L;                 % Frequency axis

figure
  plot(f,abs(f_sig/L)(1:end/2+1));  %      peak at 1kHz)
figure 
  plot(f,real(f_sig/L)(1:end/2+1)); % main peak at 1kHz)

In my example, you can see the absolute value returned no noise at frequencies other than the sinusoid of frequency 1kHz I generated while the real part had a bigger peak at 1kHz but also had much more noise.

As for effects, I don't know what you mean by that.

is it expected that "frequency values" always be complex numbers

Always? No. The Fourier series represents the frequency coefficients at which the sum of sines and cosines completely equate any continuous periodic function. Sines and cosines can be written in complex forms through Euler's formula. This is the most convenient way to store Fourier coefficients. In truth, the imaginary part of your frequency-domain signal represents the phase of the signal. (i.e if I have 2 sine functions of the same frequency, they can have different complex forms depending on the time shifting). However, most libraries that provide an FFT function will, by default, store FFT coefficients as complex numbers, to facilitate phase and magnitude calculations.

Is it convention that FFT use each column of dataset when plotting a line

I think it is an issue with mathplotlib.plot, not np.fft.

Could you please show me how to apply FFT on each row separately

There are many ways to go around this and I don't want to force you down one path, so I will propose the general solution to iterate over each row of your dataframe and apply the FFT on each specific row. Otherwise, in your case, I believe transposing your output could also work.

How to interpret this fft graph

Answers (1)

Related Questions