Reputation: 366
I´m completely new to NumPy and tried a textbook code. Unfortunately, at a certain size of calculations, the NumPy results get screwed up. Here´s the code:
import sys
from datetime import datetime
import numpy
def pythonsum(n):
a = range(n)
b = range(n)
c = []
for i in range(len(a)):
a[i] = i**2
b[i] = i**3
c.append(a[i]+b[i])
return c
def numpysum(n):
a = numpy.arange(n) ** 2
b = numpy.arange(n) ** 3
c = a + b
return c
size = int(sys.argv[1])
start = datetime.now()
c=pythonsum(size)
delta = datetime.now()-start
print "The last 2 elements of the sum",c[-2:]
print "PythonSum elapsed time in microseconds", delta.microseconds
start = datetime.now()
c=numpysum(size)
delta = datetime.now()-start
print "The last 2 elements of the sum",c[-2:]
print "NumPySum elapsed time in microseconds", delta.microseconds
Results get negative when size >= 1291 I´m working with python 2.6, MacOSX 10.6, NumPy 1.5.0 Any ideas?
Upvotes: 1
Views: 359
Reputation: 353059
I think there's some confusion in this thread. The reason that the pure-Python, i.e. non-numpy
, code works doesn't have anything to do with 32-bit vs 64-bit. It will work correctly on either: Python int
s can be of arbitrary size. [There's a bit of an implementation detail in the background involving whether it calls something an int
or a long
but you don't have to worry about it, the conversion is seamless. That's why sometimes you'll see L
at the end of a number.]
For example:
>>> 2**100
1267650600228229401496703205376L
On the other hand, numpy
integer dtypes
are fixed-precision, and will always fail for large enough numbers, regardless of how wide:
>>> for kind in numpy.int8, numpy.int16, numpy.int32, numpy.int64:
... for power in 1, 2, 5, 20:
... print kind, power, kind(10), kind(10)**power
...
<type 'numpy.int8'> 1 10 10
<type 'numpy.int8'> 2 10 100
<type 'numpy.int8'> 5 10 100000
<type 'numpy.int8'> 20 10 -2147483648
<type 'numpy.int16'> 1 10 10
<type 'numpy.int16'> 2 10 100
<type 'numpy.int16'> 5 10 100000
<type 'numpy.int16'> 20 10 -2147483648
<type 'numpy.int32'> 1 10 10
<type 'numpy.int32'> 2 10 100
<type 'numpy.int32'> 5 10 100000
<type 'numpy.int32'> 20 10 1661992960
<type 'numpy.int64'> 1 10 10
<type 'numpy.int64'> 2 10 100
<type 'numpy.int64'> 5 10 100000
<type 'numpy.int64'> 20 10 7766279631452241920
You can get the same results from numpy
as from pure Python by telling it to use the Python type, i.e. dtype=object
, albeit at a significant performance hit:
>>> import numpy
>>> numpy.array([10])
array([10])
>>> numpy.array([10])**100
__main__:1: RuntimeWarning: invalid value encountered in power
array([-2147483648])
>>> numpy.array([10], dtype=object)
array([10], dtype=object)
>>> numpy.array([10], dtype=object)**100
array([ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000], dtype=object)
Upvotes: 0
Reputation: 1806
Beginning Numpy 1.5 ?
Introductory example in "Time for Action - Adding Vectors" will only run on a 64-bit platform which permits long integers. Otherwise it will return the erroneous results:
The last 2 elements of the sum [-2143491644 -2143487647]
To solve this issue convert the integer in the power function to float, such that the floating value is forwarded. Result: a factor 10 speed up
$ python vectorsum.py 1000000
The last 2 elements of the sum [9.99995000008e+17, 9.99998000001e+17]
PythonSum elapsed time in microseconds 3 59013
The last 2 elements of the sum [ 9.99993999e+17 9.99996999e+17]
NumPySum elapsed time in microseconds 0 308598
The corrected example:
import sys
from datetime import datetime
import numpy
def numpysum(n):
a = numpy.arange(n) ** 2. b = numpy.arange(n) ** 3. c = a + b return c
def pythonsum(n): a = range(n)
b = range(n) c = [] for i in range(len(a)): a[i] = i ** 2. # notice the dot (!) b[i] = i ** 3. c.append(a[i] + b[i]) return c
size = int(sys.argv[1])
start = datetime.now()
c = pythonsum(size)
delta = datetime.now() - start
print "The last 2 elements of the sum", c[-2:]
print "PythonSum elapsed time in microseconds", delta.seconds, delta.microseconds
start = datetime.now()
c = numpysum(size)
delta = datetime.now() - start
print "The last 2 elements of the sum", c[-2:]
print "NumPySum elapsed time in microseconds", delta.seconds, delta.microseconds
the code is available in pastebin here http://paste.ubuntu.com/1169976/
Upvotes: 1