Yuzhi Xu
Yuzhi Xu

Reputation: 11

how to handle cpu cache in python ( or fastest way to call a function once)

I find out that python's VM seems to be very unfriendly with CPU-Cache.

for example:

import time
a = range(500)

sum(a)

for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
    pass


st = time.time()
sum(a)
print (time.time() - st)*1e6

time:

> 100us

another case:

import time
a = range(500)

for i in range(100000):
    st = time.time()
    sum(a)
    print (time.time() - st)*1e6

time:

~ 20us

we can see when running frequently, the code becomes much faster.

is there a solution?

we can refer to this question:(should be the same problem) python function(or a code block) runs much slower with a time interval in a loop

I feel this question is very difficult. one must has indepth unstanding about the mechanism of python virtual machine, c and cpu-cache.

Do you have any suggestion about where to post this question for a possible answer?

Upvotes: 1

Views: 2188

Answers (1)

PM 2Ring
PM 2Ring

Reputation: 55499

Your hypothesis "we can see when running frequently, the code becomes much faster" is not quite correct.

This loop

#just to create a time interval, seems this disturb cpu cache?
for i in range(1000000): 
    pass

isn't merely a simple time delay. In Python 2 it creates a list of 1,000,000 integer objects, loops over it, and then destroys it. In Python 3, range() is a generator, so it's much less wasteful of memory; the Python 2 equivalent is xrange().

In the standard CPython implementation, a list is an array of pointers to the objects it contains. On a 32 bit system, an integer object occupies 12 bytes, an empty list object occupies 32 bytes, and of course a pointer occupies 4 bytes. So range(1000000) consumes a total of 32 + 1000000 * (4 + 12) = 16000032 bytes.

Once the loop finishes, the reference count on the range() list drops to zero so it and its contents are eligible for garbage collection. It takes a little bit of time to garbage collect a million integers, so it's not surprising that you notice a time delay after the loop is finished. Also, the process of creating & deallocating all those objects will generally cause some memory fragmentation, and that will impact the efficiency of subsequent code.

FWIW, on my 2GHz Pentium 4 Linux system, your 1st code block prints values in the range 38 - 41µs, but if I change the loop to use xrange() then the time drops to 31 - 34µs.

If the code is modified to prevent garbage collection of the range() until after sum(a) is calculated, then it makes little difference in the timing whether we use range() or xrange().

import time

a = range(500)
sum(a)

bigrange = xrange(1000000)

for i in bigrange:
    pass

st = time.time()
sum(a)
print (time.time() - st) * 1e6

#Maintain a reference to bigrange to prevent it 
#being garbage collected earlier
print sum(bigrange)

Upvotes: 5

Related Questions