Robin Skoge
Robin Skoge

Reputation: 25

Numpy delivering different result

can someone explain why these 2 operations deliver different results? has it to do with some sort of maximum output? I don't mean the difference in time, but in the calculated result.


l = list(range(100000000))
a = np.arange(100000000)

%time np.sum(a ** 2)
CPU times: user 132 ms, sys: 217 ms, total: 348 ms
Wall time: 347 ms
662921401752298880

%time sum([x ** 2 for x in l])
CPU times: user 23.8 s, sys: 1.32 s, total: 25.1 s
Wall time: 25.1 s
333333328333333350000000

Upvotes: 1

Views: 77

Answers (2)

Stefan B
Stefan B

Reputation: 1677

As pointed out by pLOPeGG, BLimitless and KaPy3141 numpy's integers can overflow. You can circumvent that by specifying dtype='object' (with minor speedup):

In [1]: import numpy as np

In [2]: n = 10_000_000

In [3]: %timeit np.sum(np.arange(n, dtype='object')**2)
2.28 s ± 56.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit sum(i**2 for i in range(n))
2.6 s ± 534 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# also just for completeness sake
# multiplying with itself is considerably faster than squaring explicitly
In [5]: %timeit sum(i*i for i in range(n))
1.29 s ± 242 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# fastest thing I could come up with:
In [6]: %timeit a = np.arange(n, dtype='object'); np.sum(a * a)
939 ms ± 74.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

BLimitless
BLimitless

Reputation: 2575

@robin has it right: numpy is overflowing the int size. This is a known issue and will hopefully be fixed soon. These links have more info: https://github.com/numpy/numpy/issues/8987 and https://github.com/numpy/numpy/issues/10964

Upvotes: 1

Related Questions