Isaac Asher
Isaac Asher

Reputation: 73

Why is numpy's exp slower than Matlab? How to make it faster?

I have a pretty simple example which shows that NumPy's np.exp is about 10x slower than Matlab. How can I speed up Python? I'm running 32bit Python 2.7, NumPy version 1.11.3, and numpy is using the MKL blas & lapack libraries.

Also, the difference in time is so large that I don't think the timing mechanism is having a big effect.

Code example in Python:

import numpy as np
import timeit

setup='import numpy as np; import numexpr as ne; n=100*1000; a = np.random.uniform(size=n)'
time = timeit.timeit('b=np.exp(a)', setup=setup, number=1000)
print 'Time for 1000 (np.exp): ',time
time = timeit.timeit('b=ne.evaluate("exp(a)")', setup=setup, number=1000)
print 'Time for 1000 (numexpr): ',time

Results in:

Time for 1000 (np.exp):  2.25906916167
Time for 1000 (numexpr):  0.591470532849

In Matlab:

a = rand([100*1000,1]);
times = [];
for i=1:1000,
    tic
    b = exp(a);
    t=toc;
    times(i) = t;
end

fprintf('Time for 1000: %f\n',sum(times));

Resulting in:

Time for 1000: 0.268527

Upvotes: 4

Views: 2600

Answers (1)

Divakar
Divakar

Reputation: 221534

To improve performance especially on large datasets, we can leverage numexpr module for such transcendental functions -

import numexpr as ne

b = ne.evaluate('exp(a)')

Benchmarking

For a proper benchmarking, I would use timeit on MATLAB and NumPy's %timeit -

Set #1

MATLAB :

>> a = rand([100*1000,1]);
>> func = @() exp(a);
>> timeit(func)
ans =
    0.0013 % That's 1.3 m-sec

NumPy on identical sized dataset :

In [417]: n=100*1000
     ...: a = np.random.uniform(size=n)
     ...: 

In [418]: %timeit np.exp(a)
1000 loops, best of 3: 1.5 ms per loop

In [419]: %timeit ne.evaluate('exp(a)')
1000 loops, best of 3: 397 µs per loop

Thus,

MATLAB  : 1.3 m-sec
NumPy   : 1.5 m-sec
Numexpr : 0.4 m-sec

Set #2

MATLAB :

>> a = rand([1000*10000,1]);
>> func = @() exp(a);
>> timeit(func)
ans =
    0.0977  % That's 97 m-sec

NumPy :

In [412]: n=1000*10000
     ...: a = np.random.uniform(size=n)
     ...: 

In [413]: %timeit np.exp(a)
10 loops, best of 3: 154 ms per loop

In [414]: %timeit ne.evaluate('exp(a)')
10 loops, best of 3: 36.5 ms per loop

Thus,

MATLAB  :  97 m-sec
NumPy   : 154 m-sec
Numexpr :  36 m-sec

Proper benchmarking with tic-toc

Fault with the benchmarking in the question is that we are getting the toc elapsed timings within a loop that's not run for enough time to give us any accurate timings. The generally accepted idea is that toc elapsed timings must be at least close to 1 sec mark.

So, with those corrections, a more accurate timing test with tic-toc would be -

tic
for i=1:1000,
    b = exp(a);
end
t=toc;
timing = t./1000

This yields -

timing =
    0.0010

This is close to our 1.3 m-sec with timeit.

Upvotes: 6

Related Questions