Reputation: 2485

Why does built-in sum behave wrongly after "from numpy import *"?

I have some code like:

import math, csv, sys, re, time, datetime, pickle, os, gzip
from numpy import *

x = [1, 2, 3, ... ]
y = sum(x)

The sum of the actual values in x is 2165496761, which is larger than the limit of 32bit integer. The reported y value is -2129470535, implying integer overflow.

Why did this happen? I thought the built-in sum was supposed to use Python's arbitrary-size integers?

_{See How to restore a builtin that I overwrote by accident? if you've accidentally done something like this at the REPL (interpreter prompt).}

Upvotes: 3

Answers (3)

DSM

Reputation: 353169

Doing from numpy import * causes the built-in sum function to be replaced with numpy.sum:

>>> sum(xrange(10**7))
49999995000000L
>>> from numpy import sum
>>> sum(xrange(10**7)) # assuming a 32-bit platform
-2014260032

To verify that numpy.sum is in use, try to check the type of the result:

>>> sum([721832253, 721832254, 721832254])
-2129470535
>>> type(sum([721832253, 721832254, 721832254]))
<type 'numpy.int32'>

To avoid this problem, don't use star import.

If you must use numpy.sum and want an arbitrary-sized integer result, specify a dtype for the result like so:

>>> sum([721832253, 721832254, 721832254],dtype=object)
2165496761L

or refer to the builtin sum explicitly (possibly giving it a more convenient binding):

>>> __builtins__.sum([721832253, 721832254, 721832254])
2165496761L

Upvotes: 14

Pierre GM

Reputation: 20339

The reason why you get this invalid value is that you're using np.sum on a int32. Nothing prevents you from not using a np.int32 but a np.int64 or np.int128 dtype to represent your data. You could for example just use

x.view(np.int64).sum()

On a side note, please make sure that you never use from numpy import *. It's a terrible practice and a habit you must get rid of as soon as possible. When you use the from ... import *, you might be overwriting some Python built-ins which makes it very difficult to debug. Typical example, your overwriting of functions like sum or max...

Upvotes: 4

Martijn Pieters

Reputation: 1122252

Python handles large numbers with arbitrary precision:

>>> sum([721832253, 721832254, 721832254])
2165496761

Just sum them up!

To make sure you don't use numpy.sum, try __builtins__.sum() instead.

Upvotes: 1

Why does built-in sum behave wrongly after &quot;from numpy import *&quot;?

Answers (3)

Related Questions

Why does built-in sum behave wrongly after "from numpy import *"?