Reputation: 537
I was playing around with benchmarking numpy arrays because I was getting slower than expected results when I tried to replace python arrays with numpy arrays in a script.
I know I'm missing something, and I was hoping someone could clear up my ignorance.
I created two functions and timed them
NUM_ITERATIONS = 1000
def np_array_addition():
np_array = np.array([1, 2])
for x in xrange(NUM_ITERATIONS):
np_array[0] += x
np_array[1] += x
def py_array_addition():
py_array = [1, 2]
for x in xrange(NUM_ITERATIONS):
py_array[0] += x
py_array[1] += x
Results:
np_array_addition: 2.556 seconds
py_array_addition: 0.204 seconds
What gives? What's causing the massive slowdown? I figured that if I was using statically sized arrays numpy would be at least the same speed.
Thanks!
It kept bothering me that numpy array access was slow, and I figured "Hey, they're just arrays in memory right? Cython should solve this!"
And it did. Here's my revised benchmark
import numpy as np
cimport numpy as np
ctypedef np.int_t DTYPE_t
NUM_ITERATIONS = 200000
def np_array_assignment():
cdef np.ndarray[DTYPE_t, ndim=1] np_array = np.array([1, 2])
for x in xrange(NUM_ITERATIONS):
np_array[0] += 1
np_array[1] += 1
def py_array_assignment():
py_array = [1, 2]
for x in xrange(NUM_ITERATIONS):
py_array[0] += 1
py_array[1] += 1
I redefined the np_array
to cdef np.ndarray[DTYPE_t, ndim=1]
print(timeit(py_array_assignment, number=3))
# 0.03459
print(timeit(np_array_assignment, number=3))
# 0.00755
That's with the python function also being optimized by cython. The timing for the python function in pure python is
print(timeit(py_array_assignment, number=3))
# 0.12510
A 17x speedup. Sure it's a silly example, but I thought it was educational.
Upvotes: 3
Views: 7186
Reputation: 17871
This is not (just) addition which is slow, it is element access overhead, see for example:
def np_array_assignment():
np_array = np.array([1, 2])
for x in xrange(NUM_ITERATIONS):
np_array[0] = 1
np_array[1] = 1
def py_array_assignment():
py_array = [1, 2]
for x in xrange(NUM_ITERATIONS):
py_array[0] = 1
py_array[1] = 1
timeit np_array_assignment()
10000 loops, best of 3: 178 us per loop
timeit py_array_assignment()
10000 loops, best of 3: 72.5 us per loop
Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.
Use numpy functions to avoid looping, making operations on the whole array at once, i.e.:
def np_array_addition_good():
np_array = np.array([1, 2])
np_array += np.sum(np.arange(NUM_ITERATIONS))
The results comparing your functions with the one above are pretty revealing:
timeit np_array_addition()
1000 loops, best of 3: 1.32 ms per loop
timeit py_array_addition()
10000 loops, best of 3: 101 us per loop
timeit np_array_addition_good()
100000 loops, best of 3: 11 us per loop
But actually, you can do as good with pure python if you collapse the loops:
def py_array_addition_good():
py_array = [1, 2]
rangesum = sum(range(NUM_ITERATIONS))
py_array = [x + rangesum for x in py_array]
timeit py_array_addition_good()
100000 loops, best of 3: 11 us per loop
All in all, with such simple operations there is really no improvement in using numpy. Optimized code in pure python works just as good.
There were a lot of questions about it and I suggest looking at some good answers there:
How do I maximize efficiency with numpy arrays?
numpy float: 10x slower than builtin in arithmetic operations?
Upvotes: 5
Reputation: 47770
You're not actually using numpy's vectorized array addition if you do the loop in python; there's also the access overhead mentioned by @shashkello.
I took the liberty of increasing the array size a tad, and also adding a vectorized version of the addition:
import numpy as np
from timeit import timeit
NUM_ITERATIONS = 1000
def np_array_addition():
np_array = np.array(xrange(1000))
for x in xrange(NUM_ITERATIONS):
for i in xrange(len(np_array)):
np_array[i] += x
def np_array_addition2():
np_array = np.array(xrange(1000))
for x in xrange(NUM_ITERATIONS):
np_array += x
def py_array_addition():
py_array = range(1000)
for x in xrange(NUM_ITERATIONS):
for i in xrange(len(py_array)):
py_array[i] += x
print timeit(np_array_addition, number=3) # 4.216162
print timeit(np_array_addition2, number=3) # 0.117681
print timeit(py_array_addition, number=3) # 0.439957
As you can see, the vectorized numpy version wins pretty handily. The gap will just get larger as array sizes and/or iterations increase.
Upvotes: 4