Efficient coding with python and numpy

Question

I'm reading a reference [1] on data analysis with python and testing the code in my laptop. The text discusses how using numpy arrays can speed up things as compared to using built-in lists instead.

I'm surprised, however, by getting right the opposite results:

In [5]: L =range(10000000); %timeit sum(L)
1 loops, best of 3: 201 ms per loop

In [9]: xL=np.array(L,dtype=int); %timeit sum(xL)
1 loops, best of 3: 6.79 s per loop

The first sum is supposed to be much slower than the second. Changing the dtype option value doesn't change the result.

I'm using ipython (2.4.0) notebook with Firefox on a OSX 10.6.8. Maybe a problem with my (old) version of python/OS?

[1] "Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python for the Survey of Data", Zeljko Ivezic et al., Princeton Univ. Press 2014. Appendix A.8.

unutbu · Accepted Answer

You need to call the NumPy array's sum method, not the plain Python builtin sum function, in order to take advantage of NumPy:

In [32]: L =range(10000000)

In [33]: %timeit sum(L)
10 loops, best of 3: 82.4 ms per loop

In [34]: xL=np.array(L,dtype=int)

In [35]: %timeit xL.sum()
100 loops, best of 3: 9.49 ms per loop

Efficient coding with python and numpy

Answers (2)

Related Questions