Reputation: 1

Time-Taken: np.sum vs sum

Numpy is actually known for the less time taken to complete compared to others like lists and tupples. But from the code below if we take out the np and check for the time taken it is around 1.8 seconds, but with np the sum is taking over 21 seconds. Can you esplain me why?

import numpy as np
import time
start = time.process_time() 
p = np.sum(range(1,100000000))   
print(time.process_time() - start)

Upvotes: 0

Answers (3)

MashaMasha

Reputation: 3

It seems that most of the time is spent converting the python range to np.array. You can check it:

import numpy as np
import time
start = time.process_time() 
p = np.array(range(1,100000000))   
print(time.process_time() - start)

It's about 16 seconds on my CPU.

np.sum(range(1,100000000)) is executed in the following sequence:

a generator is being created by the range function
a np.array is being created from the generator (this is where the most time is spent)
np.sum is being calculated

I advise you to use np.arange instead of range:

import numpy as np
import time
start = time.process_time() 
p = np.sum(np.arange(1,100000000))   
print(time.process_time() - start)

In this case, the sum was calculated faster than for sum(range(1,100000000)).

Upvotes: 0

Paul H

Reputation: 68256

Doing my best to compare apples to apples here, let's fully create a list and an numpy array of integers ranging from 1 to 1,000,000:

import numpy as np

lst = list(range(1, int(1e6)))
vec = np.arange(1, 1e6, dtype=int)

Now let's use a purpose-built timing utility in jupyter to compare the operations:

%%timeit
sum(lst)

Which gives me:

9.14 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
vec.sum()

Which gives me:

957 µs ± 39.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So it seems to me that if ensure that you have the same data type and leave out the object create from the timing, numpy's performance lives up to its claims.

Upvotes: 0

Kae Gremes

Reputation: 11

I have low rep so I can't comment, but it may be because of the time to convert the range into a numpy object

I would benchmark with something like this:

import numpy as np
import time

vec = np.array(range(1, 1e8))

start = time.process_time() 
p = np.sum(vec)   
print(time.process_time() - start)

Some more experienced user could maybe point you to a benchmark utility that describes better where the time is consumed

Upvotes: 1

Time-Taken: np.sum vs sum

Answers (3)

Related Questions