Reputation: 61
I run test sqript. It use numpy.fft.fft(), anfft.fft() based on FFTW and pyfftw.interfaces.numpy_fft.fft() based on FFTW.
here is source of my test script:
import numpy as np
import anfft
import pyfftw
import time
a = pyfftw.n_byte_align_empty(128, 16, 'complex128')
a[:] = np.random.randn(128) + 1j*np.random.randn(128)
time0 = time.clock()
res1 = np.fft.fft(a)
time1 = time.clock()
res2 = anfft.fft(a)
time2 = time.clock()
res3 = pyfftw.interfaces.numpy_fft.fft(a,threads=50)
time3 = time.clock()
print 'Time numpy: %s' % (time1 - time0)
print 'Time anfft: %s' % (time2 - time1)
print 'Time pyfftw: %s' % (time3 - time2)
and I get these results:
Time numpy: 0.00154248116307
Time anfft: 0.0139805208195
Time pyfftw: 0.137729374893
anfft library produce more faster fft on huge data, but what about pyfftw? why it is so slowly?
Upvotes: 2
Views: 5419
Reputation: 8692
The problem here is the overhead in using the numpy_fft
interface. Firstly, you should enable the cache with pyfftw.interfaces.cache.enable()
, and then test the result with timeit
. Even using the cache there is a fixed overhead of using the interfaces that is not present if you use the raw interface.
On my machine, on a 128-length array, the overhead of the interface still slows it down more than numpy.fft
. As the length increases, this overhead becomes less important, so on say a 16000-length array, the numpy_fft
interface is faster.
There are tweaks you can invoke to speed things up on the interfaces end, but these are unlikely to make much difference in your case.
The best way to get the fastest possible transform in all situations is to use the FFTW
object directly, and the easiest way to do that is with the builders functions. In your case:
t = pyfftw.builders.fft(a)
timeit t()
With that I get pyfftw being about 15 times faster than np.fft
with a 128 length array.
Upvotes: 4
Reputation: 41
It might be that pyFFTW is actually spending most of its time planning the transform. Try including for example planner_effort='FFTW_ESTIMATE'
in the pyfftw fft call, and see how that affects the performance.
Upvotes: 2
Reputation: 26569
In this case, spawning more threads than you have CPU cores will not give an increase in performance, and will probably make the program slower due to the overhead of switching threads. 50 threads is complete overkill.
Try benchmarking with one thread.
Upvotes: 5