Reputation: 11
I find out that python's VM seems to be very unfriendly with CPU-Cache.
for example:
import time
a = range(500)
sum(a)
for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
time:
> 100us
another case:
import time
a = range(500)
for i in range(100000):
st = time.time()
sum(a)
print (time.time() - st)*1e6
time:
~ 20us
we can see when running frequently, the code becomes much faster.
is there a solution?
we can refer to this question:(should be the same problem) python function(or a code block) runs much slower with a time interval in a loop
I feel this question is very difficult. one must has indepth unstanding about the mechanism of python virtual machine, c and cpu-cache.
Do you have any suggestion about where to post this question for a possible answer?
Upvotes: 1
Views: 2188
Reputation: 55499
Your hypothesis "we can see when running frequently, the code becomes much faster" is not quite correct.
This loop
#just to create a time interval, seems this disturb cpu cache?
for i in range(1000000):
pass
isn't merely a simple time delay. In Python 2 it creates a list of 1,000,000 integer objects, loops over it, and then destroys it. In Python 3, range()
is a generator, so it's much less wasteful of memory; the Python 2 equivalent is xrange()
.
In the standard CPython implementation, a list is an array of pointers to the objects it contains. On a 32 bit system, an integer object occupies 12 bytes, an empty list object occupies 32 bytes, and of course a pointer occupies 4 bytes. So range(1000000)
consumes a total of 32 + 1000000 * (4 + 12) = 16000032 bytes.
Once the loop finishes, the reference count on the range()
list drops to zero so it and its contents are eligible for garbage collection. It takes a little bit of time to garbage collect a million integers, so it's not surprising that you notice a time delay after the loop is finished. Also, the process of creating & deallocating all those objects will generally cause some memory fragmentation, and that will impact the efficiency of subsequent code.
FWIW, on my 2GHz Pentium 4 Linux system, your 1st code block prints values in the range 38 - 41µs, but if I change the loop to use xrange()
then the time drops to 31 - 34µs.
If the code is modified to prevent garbage collection of the range()
until after sum(a)
is calculated, then it makes little difference in the timing whether we use range()
or xrange()
.
import time
a = range(500)
sum(a)
bigrange = xrange(1000000)
for i in bigrange:
pass
st = time.time()
sum(a)
print (time.time() - st) * 1e6
#Maintain a reference to bigrange to prevent it
#being garbage collected earlier
print sum(bigrange)
Upvotes: 5