Why do argument-less function calls execute faster?

Question

I set up a simple custom function that takes some default arguments (Python 3.5):

def foo(a=10, b=20, c=30, d=40):
    return a * b + c * d

and timed different calls to it with or without specifying argument values:

Without specifying arguments:

%timeit foo()
The slowest run took 7.83 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 361 ns per loop

Specifying arguments:

%timeit foo(a=10, b=20, c=30, d=40)
The slowest run took 12.83 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 446 ns per loop

As you can see, there is a somewhat noticeable increase in time required for the call specifying arguments and for the one not specifying them. In simple one-off calls this might be negligible, but the overhead scales and becomes more noticeable if a large number of calls to a function are made:

No arguments:

%timeit for i in range(10000): foo()
100 loops, best of 3: 3.83 ms per loop

With Arguments:

%timeit for i in range(10000): foo(a=10, b=20, c=30, d=40)
100 loops, best of 3: 4.68 ms per loop

The same behaviour is present and in Python 2.7 where the time difference between these calls was actually a bit larger foo() -> 291ns and foo(a=10, b=20, c=30, d=40) -> 410ns

Why does this happen? Should I generally try and avoid specifying argument values during calls?

Dimitris Fasarakis Hilliard · Accepted Answer

Why does this happen? Should I avoid specifying argument values during calls?

Generally, No. The real reason you're able to see this is because the function you are using is simply not computationally intensive. As such, the time required for the additional byte code commands issued in the case where arguments are supplied can be detected through timing.

If, for example, you had a more intensive function of the form:

def foo_intensive(a=10, b=20, c=30, d=40): 
    [i * j for i in range(a * b) for j in range(c * d)]

It will pretty much show no difference whatsoever in time required:

%timeit foo_intensive()
10 loops, best of 3: 32.7 ms per loop

%timeit foo_intensive(a=10, b=20, c=30, d=40)
10 loops, best of 3: 32.7 ms per loop

Even when scaled to more calls, the time required to execute the function body simply trumps the small overhead introduced by additional byte code instructions.

Looking at the Byte Code:

One way of viewing the generated byte code issued for each call case is by creating a function that wraps around foo and calls it in different ways. For now, let's create fooDefault for calls using default arguments and fooKwargs() for functions specifying keyword arguments:

# call foo without arguments, using defaults
def fooDefault():
    foo()

# call foo with keyword arguments
def fooKw():
    foo(a=10, b=20, c=30, d=40)

Now with dis we can see the differences in byte code between these. For the default version, we can see that essentially one command is issued (Ignoring POP_TOP which is present in both cases) for the function call, CALL_FUNCTION:

dis.dis(fooDefaults)
  2           0 LOAD_GLOBAL              0 (foo)
              3 CALL_FUNCTION            0 (0 positional, 0 keyword pair)  
              6 POP_TOP
              7 LOAD_CONST               0 (None)
             10 RETURN_VALUE

On the other hand, in the case where keywords are used, 8 more LOAD_CONST commands are issued in order to load the argument names (a, b, c, d) and values (10, 20, 30, 40) into the value stack (even though loading numbers < 256 is probably really fast in this case since they are cached):

dis.dis(fooKwargs)
  2           0 LOAD_GLOBAL              0 (foo)
              3 LOAD_CONST               1 ('a')    # call starts
              6 LOAD_CONST               2 (10)
              9 LOAD_CONST               3 ('b')
             12 LOAD_CONST               4 (20)
             15 LOAD_CONST               5 ('c')
             18 LOAD_CONST               6 (30)
             21 LOAD_CONST               7 ('d')
             24 LOAD_CONST               8 (40)
             27 CALL_FUNCTION         1024 (0 positional, 4 keyword pair)
             30 POP_TOP                             # call ends
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE

Additionally, a few extra steps are generally required for the case where keyword arguments are not zero. (for example in ceval/_PyEval_EvalCodeWithName()).

Even though these are really fast commands, they do sum up. The more arguments the bigger the sum and, when many calls to the function are actually performed these pile up to result in a felt difference in execution time.

A direct result of these is that the more values we specify, the more commands must be issued and the function runs slower. Additionally, specifying positional arguments, unpacking positional arguments and unpacking keyword arguments all have a different amount of overhead associated with them:

Positional arguments foo(10, 20, 30, 40): Require 4 additional commands to load each value.
List unpacking foo(*[10, 20, 30, 40]): 4 LOAD_CONST commands and an additional BUILD_LIST command.
- Using a list as in foo(*l) cuts down execution a bit since we provide an already built list containing the values.
Dictionary unpacking foo(**{'a':10, 'b':20, 'c': 30, 'd': 40}): 8 LOAD_CONST commands and a BUILD_MAP.
- As with list unpacking foo(**d) will cut down execution because a built list will bu supplied.

All in all the ordering for the execution times of different cases of calls are:

defaults < positionals < keyword arguments < list unpacking < dictionary unpacking

I suggest using dis.dis on these cases and seeing their differences.

In conclusion:

As @goofd pointed out in a comment, this is really something one should not worry about, it really does depend on the use case. If you frequently call 'light' functions from a computation standpoint, specifying defaults will produce a slight boost in speed. If you frequently supply different values this produces next to nothing.

So, it's probably negligible and trying to get boosts from obscure edge cases like this is really pushing it. If you find yourself doing this, you might want to look at things like PyPy and Cython.

Why do argument-less function calls execute faster?

Answers (1)

Looking at the Byte Code:

In conclusion:

Related Questions