user2081884
user2081884

Reputation: 61

why pyfftw based on FFTW is slower numpy's fft()?

I run test sqript. It use numpy.fft.fft(), anfft.fft() based on FFTW and pyfftw.interfaces.numpy_fft.fft() based on FFTW.

here is source of my test script:

import numpy as np
import anfft
import pyfftw
import time

a = pyfftw.n_byte_align_empty(128, 16, 'complex128')
a[:] = np.random.randn(128) + 1j*np.random.randn(128)

time0 = time.clock()
res1 = np.fft.fft(a)
time1 = time.clock()
res2 = anfft.fft(a)
time2 = time.clock()
res3 = pyfftw.interfaces.numpy_fft.fft(a,threads=50)
time3 = time.clock()

print 'Time numpy: %s' % (time1 - time0)
print 'Time anfft: %s' % (time2 - time1)
print 'Time pyfftw: %s' % (time3 - time2)

and I get these results:

Time numpy: 0.00154248116307
Time anfft: 0.0139805208195
Time pyfftw: 0.137729374893

anfft library produce more faster fft on huge data, but what about pyfftw? why it is so slowly?

Upvotes: 2

Views: 5419

Answers (3)

Henry Gomersall
Henry Gomersall

Reputation: 8692

The problem here is the overhead in using the numpy_fft interface. Firstly, you should enable the cache with pyfftw.interfaces.cache.enable(), and then test the result with timeit. Even using the cache there is a fixed overhead of using the interfaces that is not present if you use the raw interface.

On my machine, on a 128-length array, the overhead of the interface still slows it down more than numpy.fft. As the length increases, this overhead becomes less important, so on say a 16000-length array, the numpy_fft interface is faster.

There are tweaks you can invoke to speed things up on the interfaces end, but these are unlikely to make much difference in your case.

The best way to get the fastest possible transform in all situations is to use the FFTW object directly, and the easiest way to do that is with the builders functions. In your case:

t = pyfftw.builders.fft(a)
timeit t()

With that I get pyfftw being about 15 times faster than np.fft with a 128 length array.

Upvotes: 4

mhavu
mhavu

Reputation: 41

It might be that pyFFTW is actually spending most of its time planning the transform. Try including for example planner_effort='FFTW_ESTIMATE' in the pyfftw fft call, and see how that affects the performance.

Upvotes: 2

Colonel Thirty Two
Colonel Thirty Two

Reputation: 26569

In this case, spawning more threads than you have CPU cores will not give an increase in performance, and will probably make the program slower due to the overhead of switching threads. 50 threads is complete overkill.

Try benchmarking with one thread.

Upvotes: 5

Related Questions