Reputation: 73
I have a pretty simple example which shows that NumPy's np.exp
is about 10x slower than Matlab. How can I speed up Python? I'm running 32bit Python 2.7, NumPy version 1.11.3, and numpy is using the MKL blas & lapack libraries.
Also, the difference in time is so large that I don't think the timing mechanism is having a big effect.
Code example in Python:
import numpy as np
import timeit
setup='import numpy as np; import numexpr as ne; n=100*1000; a = np.random.uniform(size=n)'
time = timeit.timeit('b=np.exp(a)', setup=setup, number=1000)
print 'Time for 1000 (np.exp): ',time
time = timeit.timeit('b=ne.evaluate("exp(a)")', setup=setup, number=1000)
print 'Time for 1000 (numexpr): ',time
Results in:
Time for 1000 (np.exp): 2.25906916167
Time for 1000 (numexpr): 0.591470532849
In Matlab:
a = rand([100*1000,1]);
times = [];
for i=1:1000,
tic
b = exp(a);
t=toc;
times(i) = t;
end
fprintf('Time for 1000: %f\n',sum(times));
Resulting in:
Time for 1000: 0.268527
Upvotes: 4
Views: 2600
Reputation: 221534
To improve performance especially on large datasets, we can leverage numexpr
module for such transcendental functions -
import numexpr as ne
b = ne.evaluate('exp(a)')
For a proper benchmarking, I would use timeit on MATLAB
and NumPy's %timeit
-
Set #1
MATLAB :
>> a = rand([100*1000,1]);
>> func = @() exp(a);
>> timeit(func)
ans =
0.0013 % That's 1.3 m-sec
NumPy on identical sized dataset :
In [417]: n=100*1000
...: a = np.random.uniform(size=n)
...:
In [418]: %timeit np.exp(a)
1000 loops, best of 3: 1.5 ms per loop
In [419]: %timeit ne.evaluate('exp(a)')
1000 loops, best of 3: 397 µs per loop
Thus,
MATLAB : 1.3 m-sec
NumPy : 1.5 m-sec
Numexpr : 0.4 m-sec
Set #2
MATLAB :
>> a = rand([1000*10000,1]);
>> func = @() exp(a);
>> timeit(func)
ans =
0.0977 % That's 97 m-sec
NumPy :
In [412]: n=1000*10000
...: a = np.random.uniform(size=n)
...:
In [413]: %timeit np.exp(a)
10 loops, best of 3: 154 ms per loop
In [414]: %timeit ne.evaluate('exp(a)')
10 loops, best of 3: 36.5 ms per loop
Thus,
MATLAB : 97 m-sec
NumPy : 154 m-sec
Numexpr : 36 m-sec
Proper benchmarking with tic-toc
Fault with the benchmarking in the question is that we are getting the toc elapsed timings within a loop that's not run for enough time to give us any accurate timings. The generally accepted idea is that toc
elapsed timings must be at least close to 1
sec mark.
So, with those corrections, a more accurate timing test with tic-toc
would be -
tic
for i=1:1000,
b = exp(a);
end
t=toc;
timing = t./1000
This yields -
timing =
0.0010
This is close to our 1.3 m-sec
with timeit
.
Upvotes: 6