Doc
Doc

Reputation: 366

numpy beginner array plain python vs. numpy vectors: faulty results

I´m completely new to NumPy and tried a textbook code. Unfortunately, at a certain size of calculations, the NumPy results get screwed up. Here´s the code:

import sys
from datetime import datetime
import numpy

def pythonsum(n):
    a = range(n)
    b = range(n)
    c = []
    for i in range(len(a)):
        a[i] = i**2
        b[i] = i**3
        c.append(a[i]+b[i])
    return c

def numpysum(n):
    a = numpy.arange(n) ** 2
    b = numpy.arange(n) ** 3
    c = a + b
    return c

size = int(sys.argv[1])
start = datetime.now()
c=pythonsum(size)
delta = datetime.now()-start
print "The last 2 elements of the sum",c[-2:]
print "PythonSum elapsed time in microseconds", delta.microseconds
start = datetime.now()
c=numpysum(size)
delta = datetime.now()-start
print "The last 2 elements of the sum",c[-2:]
print "NumPySum elapsed time in microseconds", delta.microseconds

Results get negative when size >= 1291 I´m working with python 2.6, MacOSX 10.6, NumPy 1.5.0 Any ideas?

Upvotes: 1

Views: 359

Answers (2)

DSM
DSM

Reputation: 353059

I think there's some confusion in this thread. The reason that the pure-Python, i.e. non-numpy, code works doesn't have anything to do with 32-bit vs 64-bit. It will work correctly on either: Python ints can be of arbitrary size. [There's a bit of an implementation detail in the background involving whether it calls something an int or a long but you don't have to worry about it, the conversion is seamless. That's why sometimes you'll see L at the end of a number.]

For example:

>>> 2**100
1267650600228229401496703205376L

On the other hand, numpy integer dtypes are fixed-precision, and will always fail for large enough numbers, regardless of how wide:

>>> for kind in numpy.int8, numpy.int16, numpy.int32, numpy.int64:
...     for power in 1, 2, 5, 20:
...         print kind, power, kind(10), kind(10)**power
... 
<type 'numpy.int8'> 1 10 10
<type 'numpy.int8'> 2 10 100
<type 'numpy.int8'> 5 10 100000
<type 'numpy.int8'> 20 10 -2147483648
<type 'numpy.int16'> 1 10 10
<type 'numpy.int16'> 2 10 100
<type 'numpy.int16'> 5 10 100000
<type 'numpy.int16'> 20 10 -2147483648
<type 'numpy.int32'> 1 10 10
<type 'numpy.int32'> 2 10 100
<type 'numpy.int32'> 5 10 100000
<type 'numpy.int32'> 20 10 1661992960
<type 'numpy.int64'> 1 10 10
<type 'numpy.int64'> 2 10 100
<type 'numpy.int64'> 5 10 100000
<type 'numpy.int64'> 20 10 7766279631452241920

You can get the same results from numpy as from pure Python by telling it to use the Python type, i.e. dtype=object, albeit at a significant performance hit:

>>> import numpy
>>> numpy.array([10])
array([10])
>>> numpy.array([10])**100
__main__:1: RuntimeWarning: invalid value encountered in power
array([-2147483648])
>>> numpy.array([10], dtype=object)
array([10], dtype=object)
>>> numpy.array([10], dtype=object)**100
array([ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000], dtype=object)

Upvotes: 0

root-11
root-11

Reputation: 1806

Beginning Numpy 1.5 ?

Introductory example in "Time for Action - Adding Vectors" will only run on a 64-bit platform which permits long integers. Otherwise it will return the erroneous results:

The last 2 elements of the sum [-2143491644 -2143487647]

To solve this issue convert the integer in the power function to float, such that the floating value is forwarded. Result: a factor 10 speed up

$ python vectorsum.py 1000000

The last 2 elements of the sum [9.99995000008e+17, 9.99998000001e+17]

PythonSum elapsed time in microseconds 3 59013

The last 2 elements of the sum [ 9.99993999e+17 9.99996999e+17]

NumPySum elapsed time in microseconds 0 308598

The corrected example:

import sys

from datetime import datetime

import numpy

def numpysum(n):

a = numpy.arange(n) ** 2.

b = numpy.arange(n) ** 3.

c = a + b

return c

def pythonsum(n): a = range(n)

  b = range(n)

  c = []

  for i in range(len(a)):

      a[i] = i ** 2.     # notice the dot (!)

      b[i] = i ** 3.

      c.append(a[i] + b[i])

  return c

size = int(sys.argv[1])

start = datetime.now()

c = pythonsum(size)

delta = datetime.now() - start

print "The last 2 elements of the sum", c[-2:]

print "PythonSum elapsed time in microseconds", delta.seconds, delta.microseconds

start = datetime.now()

c = numpysum(size)

delta = datetime.now() - start

print "The last 2 elements of the sum", c[-2:]

print "NumPySum elapsed time in microseconds", delta.seconds, delta.microseconds

the code is available in pastebin here http://paste.ubuntu.com/1169976/

Upvotes: 1

Related Questions