Reputation: 2485
I have some code like:
import math, csv, sys, re, time, datetime, pickle, os, gzip
from numpy import *
x = [1, 2, 3, ... ]
y = sum(x)
The sum of the actual values in x
is 2165496761, which is larger than the limit of 32bit integer. The reported y
value is -2129470535
, implying integer overflow.
Why did this happen? I thought the built-in sum
was supposed to use Python's arbitrary-size integers?
See How to restore a builtin that I overwrote by accident? if you've accidentally done something like this at the REPL (interpreter prompt).
Upvotes: 3
Views: 5777
Reputation: 353169
Doing from numpy import *
causes the built-in sum
function to be replaced with numpy.sum
:
>>> sum(xrange(10**7))
49999995000000L
>>> from numpy import sum
>>> sum(xrange(10**7)) # assuming a 32-bit platform
-2014260032
To verify that numpy.sum
is in use, try to check the type
of the result:
>>> sum([721832253, 721832254, 721832254])
-2129470535
>>> type(sum([721832253, 721832254, 721832254]))
<type 'numpy.int32'>
To avoid this problem, don't use star import.
If you must use numpy.sum
and want an arbitrary-sized integer result, specify a dtype
for the result like so:
>>> sum([721832253, 721832254, 721832254],dtype=object)
2165496761L
or refer to the builtin sum
explicitly (possibly giving it a more convenient binding):
>>> __builtins__.sum([721832253, 721832254, 721832254])
2165496761L
Upvotes: 14
Reputation: 20339
The reason why you get this invalid value is that you're using np.sum
on a int32
. Nothing prevents you from not using a np.int32
but a np.int64
or np.int128
dtype
to represent your data. You could for example just use
x.view(np.int64).sum()
On a side note, please make sure that you never use from numpy import *
. It's a terrible practice and a habit you must get rid of as soon as possible. When you use the from ... import *
, you might be overwriting some Python built-ins which makes it very difficult to debug. Typical example, your overwriting of functions like sum
or max
...
Upvotes: 4
Reputation: 1122252
Python handles large numbers with arbitrary precision:
>>> sum([721832253, 721832254, 721832254])
2165496761
Just sum them up!
To make sure you don't use numpy.sum
, try __builtins__.sum()
instead.
Upvotes: 1