Why is my 'xmap' function not any faster than built-in 'map'?

Question

Suppose there is a class A and that there is a list of instances of class A called lst.

Suppose we want to call a specific method, m, of class A millions and millions of times over and over again on every instance in our list, (practical example: an entity.update() method in a game loop). We know that the easy way to do this is the following:

for obj in lst: obj.m()

However that kind of code puts us to sleep. So we think of using map in the following way:

map(lambda obj: obj.m(), lst)

But we run a few time tests on the above line of code and it turns out that it's much much slower than our simple for loop. Sometimes it's even 2 times as slow. Then we think to ourselves, "hmm maybe its slower because map constructs a list of return values for all the function calls and returns that list".

Suppose we take inspiration from the lazy and memory-efficient built-in function called xrange. For the most part, we think to ourselves, it's a cooler version of range. So we define a function called xmap that simply applies a function over a list of objects without constructing a list of return values and returning it. The implementation is the following:

def xmap(func, lst):
    for obj in lst: func(obj)

Pretty cool because this function just executes for loop above, only it allows us to stay fancy and send in our lambda functions. We think this is the perfect compromise. But we are meticulous and careful so we decide to make 2 scripts to test the speed of our code to see if we actually made it any faster than map.

Our first script will simply use map and uselessly construct a list that we don't even need.

script1.py :

class A:
    def m(self):
        pass

lst = [A() for i in xrange(15)]

import time
start = time.time()

for i in xrange(1000000):
    map(lambda obj: obj.m(), lst)

print time.time()-start, 'seconds'

Our second script will use xmap and we believe that it will be faster because it won't have to construct a list of 15 return values 1,000,000 times and return it.

script2.py

def xmap(func, lst):
    for obj in lst: func(obj)

class A:
    def m(self):
        pass

lst = [A() for i in xrange(15)]

import time
start = time.time()

for i in xrange(1000000):
    xmap(lambda obj: obj.m(), lst)

print time.time()-start, 'seconds'

Finally we are done and somewhat excited to see how much faster our code will be. However after running both scripts a few times against each other, it turns out that script2.py does not seem any faster than script1.py. Actually, as it turns out, script2.py sometimes takes even longer to run than script1.py. xmap seems to take, more or less, the same amount of time as map.

Why did I get these results?

C:\dev\py>python script1.py
14.7799999714 seconds

C:\dev\py>python script2.py
14.2170000076 seconds

C:\dev\py>python script1.py
12.1800000668 seconds

C:\dev\py>python script2.py
12.5759999752 seconds

C:\dev\py>python script1.py
14.3020000458 seconds

C:\dev\py>python script2.py
14.9490001202 seconds

C:\dev\py>python script1.py
14.6879999638 seconds

C:\dev\py>python script2.py
14.3139998913 seconds

I thought that I would have at least have optimized something from map because I wasn't constructing a list of return values, but my code doesn't seem to be any faster. I know for a fact that list construction takes some time because I've done the following:

>>> import timeit
>>> timeit.timeit('[]')
0.1297345953932106
>>> timeit.timeit('[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]')
0.6807682686160632
>>> timeit.timeit('[None,None,None,None,None,None,None,None,None,None,None,None,
None,None,None]')
0.7460120889200539

So why does my xmap function not seem to be any faster than map?

DSM · Accepted Answer

First, and most importantly: timing micro-optimizations like this is only going to confuse you, because you're measuring very low-level overheads on things like functions.

In this case, though, I'm pretty sure the issue is that the gains you might get from map/xmap you're losing because of the extra lambda. If you use A.m directly instead, things look much better:

>>> %timeit for obj in lst: obj.m()
100000 loops, best of 3: 2.99 µs per loop
>>> %timeit [obj.m() for obj in lst]
100000 loops, best of 3: 3.5 µs per loop
>>> %timeit xmap(lambda obj: obj.m(), lst)
100000 loops, best of 3: 5.69 µs per loop
>>> %timeit xmap(A.m, lst)
100000 loops, best of 3: 3.32 µs per loop

FWIW, it looks to me like eventually your xmap can win:

>>> lst = [A() for i in xrange(10**3)]
>>> %timeit for obj in lst: obj.m()
1000 loops, best of 3: 198 µs per loop
>>> %timeit [obj.m() for obj in lst]
1000 loops, best of 3: 216 µs per loop
>>> %timeit xmap(lambda obj: obj.m(), lst)
1000 loops, best of 3: 353 µs per loop
>>> %timeit xmap(A.m, lst)
10000 loops, best of 3: 189 µs per loop

but I wouldn't take these numbers too seriously either.

When you say "that kind of code [i.e. simple for loops] puts us to sleep", I agree-- writing simple loops means that you're done programming much faster and can get to bed earlier.

Why is my 'xmap' function not any faster than built-in 'map'?

Answers (2)

Related Questions

Why is my &#39;xmap&#39; function not any faster than built-in &#39;map&#39;?

Answers (2)

Related Questions

Why is my 'xmap' function not any faster than built-in 'map'?