Reputation: 2801
I wrote this simple code below to compare the speed of Python and Fortran.
import time
start = time.time()
x = 0
for i in range(1,1000000000):
x = x + i
x = 0
end = time.time()
print end - start
I set x =0 because in Fortran it overflows. When I run the code by Fortran, it returns the answer in 9 seconds, but when I run the code by Python, Python takes all 24GB of my system's RAM and program crashes.
What's the problem?
Upvotes: 0
Views: 2048
Reputation: 1121366
You are using range()
, which produces a list object of all integers in the range. You produced a list object with 10**9 integers in it. On a 64-bit machine that takes about 30GB to hold all those objects, provided your OS will let you allocate that much.
Use the xrange()
function instead; this produces an object that only yields numbers as you iterate:
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xrange(10)
xrange(10)
>>> list(xrange(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Summing 1 billion integers will still take time on Python; you can use sum(xrange(10**9))
to delegate most of the work to C code here, but you'll still be producing 1 billion integer objects for the process.
My rather aging 2008 vintage Macbook Pro does this:
>>> import timeit
>>> timeit.timeit('sum(xrange(10**9))', number=3)
38.89629793167114
so taking, on average, 13 seconds to produce and destroy 1 billion int
objects. The actual summing is dwarfed by those allocations and deallocations here.
Use NumPy for calculations involving large datasets instead; a NumPy array
holds C datatypes, not Python objects, so will take vastly less memory and the library offers optimized routines that'll beat anything pure Python can do hands-down.
Upvotes: 16