How to use numpy for large sets of data

Question

I have a really big set of datapoints (1 million at least). I am using pyFFTW to do the FFT. To get the x axis values, I am using x = np.linespace(0.0, 1.0 / (2.0 * T), len(fft_data))

I need to return all the FFT values as list of lists (e.g: [[x1, y1], [x2, y2]]).

I am using this code:

for i, item in enumerate(x):
    result.append([item, 2.0 / N * abs(fft_data[i])])

The problem is that my for loop has to iterate 500 000 elements and it is not as fast as I want. It takes about 13s on my computer. Is there any way to do this faster? I am thinking to use numpy, but I don't have too much experience with it.

One improvement that I was able to use was to check if 2.0 / N * abs(fft_data[i]) is lower than 0.001. I don't need to return the values that are too small because they are irrelevant for my application.

Do you have any idea how I can speed up the algorithm?

rafaelc · Accepted Answer

IIUC, just

y = 2.0 / N * np.abs(fft_data)

and hstack

np.hstack([x.reshape(-1,1),
           y.reshape(-1,1)])

How to use numpy for large sets of data

Answers (2)

Related Questions