Lorenz Forvang
Lorenz Forvang

Reputation: 265

Comparing Numpy and Matlab array summation speed

I recently converted a MATLAB script to Python with Numpy, and found that it ran significantly slower. I expected similar performance, so I'm wondering if I'm doing something wrong.

As stripped-down example, I manually sum a geometric series:

MATLAB version:

function s = array_sum(a, array_size, iterations)
    s = zeros(array_size);
    for m = 1:iterations
        s = a + 0.5*s;
    end
end

% benchmark code
array_size = 500
iterations = 500
a = randn(array_size)
f = @() array_sum(a, array_size, iterations);
fprintf('run time: %.2f ms\n', timeit(f)*1e3);

Python/Numpy version:

import numpy as np
import timeit

def array_sum(a, array_size, iterations):
    s = np.zeros((array_size, array_size))
    for m in range(iterations):
        s = a + 0.5*s
    return s

array_size = 500
iterations = 500
a = np.random.randn(array_size, array_size)
timeit_iterations = 10
t1 = timeit.timeit(lambda: array_sum(a, array_size, iterations),
                   number=timeit_iterations)
print("run time: {:.2f} ms".format(1e3*t1/timeit_iterations))

On my machine, MATLAB completes in 58 ms. The Python version runs in 292 ms, or 5X slower.

I also tried speeding up the Python code by adding the Numba JIT decorator @jit('f8[:,:](i8, i8)', nopython=True), but the time only dropped to 236 ms (4X slower).

This is slower than I expected. Am I using timeit improperly? Is there something wrong with my Python code?

EDIT: edited so that the random matrix is created outside of benchmarked function.

EDIT 2: I ran the benchmark using Torch instead of Numpy (calculating the sum as s = torch.add(s, 0.5, a)) and it runs in just 52 ms on my computer!

Upvotes: 12

Views: 859

Answers (3)

seekiu
seekiu

Reputation: 127

From my experience, when using numba's jit function it's usually faster to expand array operations into loops. So I tried to rewrite your python function as:

@jit(nopython=True, cache=True)
def array_sum_numba(a, array_size, iterations):
    s = np.zeros((array_size, array_size))
    for m in range(iterations):
        for i in range(array_size):
            for j in range(array_size):
                s[i,j] = a[i,j] + 0.5 * s[i,j]
    return s

And out of curiosity, I've also tested @percusse's version with a little modification on the parameter:

def array_sum2(r, array_size, iterations):
    s = np.zeros((array_size, array_size))
    for m in range(iterations):
        s /= 2
        s += r
    return s

The testing results on my machine are:

  • original version run time: 143.83 ms
  • numba jitted loop version run time: 26.99 ms
  • @percusse's version run time: 61.38 ms

This result is within my expectation. It's worthing mentioning that I've increased timeit iterations to 50, which results in some significant time reduction for numba version.

In summary: The Python code can still be significantly accelerated if you use numba's jit and write the function in loops. I don't have Matlab on my machine to test, but my guess is with numba the python version is faster.

Upvotes: 3

percusse
percusse

Reputation: 3106

Since you are updating the same variable suitable for inplace operations, you can update your function as

def array_sum2(array_size, iterations):
    s = np.zeros((array_size, array_size))
    r = np.random.randn(array_size, array_size)
    for m in range(iterations):
        s /= 2
        s += r
    return s

This has given the following speed benefit on my machine compared to array_sum

run time: 157.32 ms
run time2: 672.43 ms

Upvotes: 2

hpaulj
hpaulj

Reputation: 231375

Times include the randn call as well as the summation:

In [68]: timeit array_sum(array_size, 0)
16.6 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [69]: timeit array_sum(array_size, 1)
18.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [70]: timeit array_sum(array_size, 20)
55.5 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [71]: (55-16)/20
Out[71]: 1.95

So it's 16ms for the setup, and 2ms per iteration. Same pattern with 500 iterations.

MATLAB does some JIT compilation. I don't know if that's the case here or not. I don't have MATLAB to test. In Octave (no timeit)

>> t = time(); array_sum(500,0); (time()-t)*1000
ans =  13.704
>> t = time(); array_sum(500,1); (time()-t)*1000
ans =  16.219
>> t = time(); array_sum(500,20); (time()-t)*1000
ans =  82.346
>> t = time(); array_sum(500,500); (time()-t)*1000
ans =  1610.6

Octave's random is faster, but the per iteration sum is slower.

Upvotes: 0

Related Questions