MASL
MASL

Reputation: 949

Efficient coding with python and numpy

I'm reading a reference [1] on data analysis with python and testing the code in my laptop. The text discusses how using numpy arrays can speed up things as compared to using built-in lists instead.

I'm surprised, however, by getting right the opposite results:

In [5]: L =range(10000000); %timeit sum(L)
1 loops, best of 3: 201 ms per loop

In [9]: xL=np.array(L,dtype=int); %timeit sum(xL)
1 loops, best of 3: 6.79 s per loop

The first sum is supposed to be much slower than the second. Changing the dtype option value doesn't change the result.

I'm using ipython (2.4.0) notebook with Firefox on a OSX 10.6.8. Maybe a problem with my (old) version of python/OS?

[1] "Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python for the Survey of Data", Zeljko Ivezic et al., Princeton Univ. Press 2014. Appendix A.8.

Upvotes: 0

Views: 63

Answers (2)

wim
wim

Reputation: 363616

You're using the python sum on the numpy array instead of numpy's sum:

>>> import numpy as np
>>> L = range(10000000)
>>> timeit sum(L)
10 loops, best of 3: 69.9 ms per loop
>>> xL = np.array(L, dtype=int)
>>> timeit sum(xL)
1 loops, best of 3: 715 ms per loop

Slooooow! Here's the 10x speedup:

>>> timeit xL.sum()
100 loops, best of 3: 7.34 ms per loop
>>> timeit np.sum(xL)
100 loops, best of 3: 7.38 ms per loop

Upvotes: 2

unutbu
unutbu

Reputation: 881027

You need to call the NumPy array's sum method, not the plain Python builtin sum function, in order to take advantage of NumPy:

In [32]: L =range(10000000)

In [33]: %timeit sum(L)
10 loops, best of 3: 82.4 ms per loop

In [34]: xL=np.array(L,dtype=int)

In [35]: %timeit xL.sum()
100 loops, best of 3: 9.49 ms per loop

Upvotes: 2

Related Questions