numpy beginner array plain python vs. numpy vectors: faulty results

Question

I´m completely new to NumPy and tried a textbook code. Unfortunately, at a certain size of calculations, the NumPy results get screwed up. Here´s the code:

import sys
from datetime import datetime
import numpy

def pythonsum(n):
    a = range(n)
    b = range(n)
    c = []
    for i in range(len(a)):
        a[i] = i**2
        b[i] = i**3
        c.append(a[i]+b[i])
    return c

def numpysum(n):
    a = numpy.arange(n) ** 2
    b = numpy.arange(n) ** 3
    c = a + b
    return c

size = int(sys.argv[1])
start = datetime.now()
c=pythonsum(size)
delta = datetime.now()-start
print "The last 2 elements of the sum",c[-2:]
print "PythonSum elapsed time in microseconds", delta.microseconds
start = datetime.now()
c=numpysum(size)
delta = datetime.now()-start
print "The last 2 elements of the sum",c[-2:]
print "NumPySum elapsed time in microseconds", delta.microseconds

Results get negative when size >= 1291 I´m working with python 2.6, MacOSX 10.6, NumPy 1.5.0 Any ideas?

DSM · Accepted Answer

I think there's some confusion in this thread. The reason that the pure-Python, i.e. non-numpy, code works doesn't have anything to do with 32-bit vs 64-bit. It will work correctly on either: Python ints can be of arbitrary size. [There's a bit of an implementation detail in the background involving whether it calls something an int or a long but you don't have to worry about it, the conversion is seamless. That's why sometimes you'll see L at the end of a number.]

For example:

>>> 2**100
1267650600228229401496703205376L

On the other hand, numpy integer dtypes are fixed-precision, and will always fail for large enough numbers, regardless of how wide:

>>> for kind in numpy.int8, numpy.int16, numpy.int32, numpy.int64:
...     for power in 1, 2, 5, 20:
...         print kind, power, kind(10), kind(10)**power
... 
 1 10 10
 2 10 100
 5 10 100000
 20 10 -2147483648
 1 10 10
 2 10 100
 5 10 100000
 20 10 -2147483648
 1 10 10
 2 10 100
 5 10 100000
 20 10 1661992960
 1 10 10
 2 10 100
 5 10 100000
 20 10 7766279631452241920

You can get the same results from numpy as from pure Python by telling it to use the Python type, i.e. dtype=object, albeit at a significant performance hit:

>>> import numpy
>>> numpy.array([10])
array([10])
>>> numpy.array([10])**100
__main__:1: RuntimeWarning: invalid value encountered in power
array([-2147483648])
>>> numpy.array([10], dtype=object)
array([10], dtype=object)
>>> numpy.array([10], dtype=object)**100
array([ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000], dtype=object)

numpy beginner array plain python vs. numpy vectors: faulty results

Answers (2)

Related Questions