Reputation: 949
I'm reading a reference [1] on data analysis with python and testing the code in my laptop. The text discusses how using numpy arrays can speed up things as compared to using built-in lists instead.
I'm surprised, however, by getting right the opposite results:
In [5]: L =range(10000000); %timeit sum(L)
1 loops, best of 3: 201 ms per loop
In [9]: xL=np.array(L,dtype=int); %timeit sum(xL)
1 loops, best of 3: 6.79 s per loop
The first sum is supposed to be much slower than the second. Changing the dtype
option value doesn't change the result.
I'm using ipython (2.4.0) notebook with Firefox on a OSX 10.6.8. Maybe a problem with my (old) version of python/OS?
[1] "Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python for the Survey of Data", Zeljko Ivezic et al., Princeton Univ. Press 2014. Appendix A.8.
Upvotes: 0
Views: 63
Reputation: 363616
You're using the python sum
on the numpy array instead of numpy's sum:
>>> import numpy as np
>>> L = range(10000000)
>>> timeit sum(L)
10 loops, best of 3: 69.9 ms per loop
>>> xL = np.array(L, dtype=int)
>>> timeit sum(xL)
1 loops, best of 3: 715 ms per loop
Slooooow! Here's the 10x speedup:
>>> timeit xL.sum()
100 loops, best of 3: 7.34 ms per loop
>>> timeit np.sum(xL)
100 loops, best of 3: 7.38 ms per loop
Upvotes: 2
Reputation: 881027
You need to call the NumPy array's sum
method, not the plain Python builtin sum
function, in order to take advantage of NumPy:
In [32]: L =range(10000000)
In [33]: %timeit sum(L)
10 loops, best of 3: 82.4 ms per loop
In [34]: xL=np.array(L,dtype=int)
In [35]: %timeit xL.sum()
100 loops, best of 3: 9.49 ms per loop
Upvotes: 2