Reputation: 1
Numpy is actually known for the less time taken to complete compared to others like lists and tupples. But from the code below if we take out the np and check for the time taken it is around 1.8 seconds, but with np the sum is taking over 21 seconds. Can you esplain me why?
import numpy as np
import time
start = time.process_time()
p = np.sum(range(1,100000000))
print(time.process_time() - start)
Upvotes: 0
Views: 550
Reputation: 3
It seems that most of the time is spent converting the python range to np.array. You can check it:
import numpy as np
import time
start = time.process_time()
p = np.array(range(1,100000000))
print(time.process_time() - start)
It's about 16 seconds on my CPU.
np.sum(range(1,100000000))
is executed in the following sequence:
range
functionI advise you to use np.arange
instead of range:
import numpy as np
import time
start = time.process_time()
p = np.sum(np.arange(1,100000000))
print(time.process_time() - start)
In this case, the sum was calculated faster than for sum(range(1,100000000))
.
Upvotes: 0
Reputation: 68256
Doing my best to compare apples to apples here, let's fully create a list and an numpy array of integers ranging from 1 to 1,000,000:
import numpy as np
lst = list(range(1, int(1e6)))
vec = np.arange(1, 1e6, dtype=int)
Now let's use a purpose-built timing utility in jupyter to compare the operations:
%%timeit
sum(lst)
Which gives me:
9.14 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
vec.sum()
Which gives me:
957 µs ± 39.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So it seems to me that if ensure that you have the same data type and leave out the object create from the timing, numpy's performance lives up to its claims.
Upvotes: 0
Reputation: 11
I have low rep so I can't comment, but it may be because of the time to convert the range into a numpy object
I would benchmark with something like this:
import numpy as np
import time
vec = np.array(range(1, 1e8))
start = time.process_time()
p = np.sum(vec)
print(time.process_time() - start)
Some more experienced user could maybe point you to a benchmark utility that describes better where the time is consumed
Upvotes: 1