Reputation: 160587
I set up a simple custom function that takes some default arguments (Python 3.5):
def foo(a=10, b=20, c=30, d=40):
return a * b + c * d
and timed different calls to it with or without specifying argument values:
Without specifying arguments:
%timeit foo()
The slowest run took 7.83 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 361 ns per loop
Specifying arguments:
%timeit foo(a=10, b=20, c=30, d=40)
The slowest run took 12.83 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 446 ns per loop
As you can see, there is a somewhat noticeable increase in time required for the call specifying arguments and for the one not specifying them. In simple one-off calls this might be negligible, but the overhead scales and becomes more noticeable if a large number of calls to a function are made:
No arguments:
%timeit for i in range(10000): foo()
100 loops, best of 3: 3.83 ms per loop
With Arguments:
%timeit for i in range(10000): foo(a=10, b=20, c=30, d=40)
100 loops, best of 3: 4.68 ms per loop
The same behaviour is present and in Python 2.7 where the time difference between these calls was actually a bit larger foo() -> 291ns
and foo(a=10, b=20, c=30, d=40) -> 410ns
Why does this happen? Should I generally try and avoid specifying argument values during calls?
Upvotes: 10
Views: 786
Reputation: 160587
Why does this happen? Should I avoid specifying argument values during calls?
Generally, No. The real reason you're able to see this is because the function you are using is simply not computationally intensive. As such, the time required for the additional byte code commands issued in the case where arguments are supplied can be detected through timing.
If, for example, you had a more intensive function of the form:
def foo_intensive(a=10, b=20, c=30, d=40):
[i * j for i in range(a * b) for j in range(c * d)]
It will pretty much show no difference whatsoever in time required:
%timeit foo_intensive()
10 loops, best of 3: 32.7 ms per loop
%timeit foo_intensive(a=10, b=20, c=30, d=40)
10 loops, best of 3: 32.7 ms per loop
Even when scaled to more calls, the time required to execute the function body simply trumps the small overhead introduced by additional byte code instructions.
One way of viewing the generated byte code issued for each call case is by creating a function that wraps around foo
and calls it in different ways. For now, let's create fooDefault
for calls using default arguments and fooKwargs()
for functions specifying keyword arguments:
# call foo without arguments, using defaults
def fooDefault():
foo()
# call foo with keyword arguments
def fooKw():
foo(a=10, b=20, c=30, d=40)
Now with dis
we can see the differences in byte code between these. For the default version, we can see that essentially one command is issued (Ignoring POP_TOP
which is present in both cases) for the function call, CALL_FUNCTION
:
dis.dis(fooDefaults)
2 0 LOAD_GLOBAL 0 (foo)
3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE
On the other hand, in the case where keywords are used, 8 more LOAD_CONST
commands are issued in order to load the argument names (a, b, c, d)
and values (10, 20, 30, 40)
into the value stack (even though loading numbers < 256
is probably really fast in this case since they are cached):
dis.dis(fooKwargs)
2 0 LOAD_GLOBAL 0 (foo)
3 LOAD_CONST 1 ('a') # call starts
6 LOAD_CONST 2 (10)
9 LOAD_CONST 3 ('b')
12 LOAD_CONST 4 (20)
15 LOAD_CONST 5 ('c')
18 LOAD_CONST 6 (30)
21 LOAD_CONST 7 ('d')
24 LOAD_CONST 8 (40)
27 CALL_FUNCTION 1024 (0 positional, 4 keyword pair)
30 POP_TOP # call ends
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
Additionally, a few extra steps are generally required for the case where keyword arguments are not zero. (for example in ceval/_PyEval_EvalCodeWithName()
).
Even though these are really fast commands, they do sum up. The more arguments the bigger the sum and, when many calls to the function are actually performed these pile up to result in a felt difference in execution time.
A direct result of these is that the more values we specify, the more commands must be issued and the function runs slower. Additionally, specifying positional arguments, unpacking positional arguments and unpacking keyword arguments all have a different amount of overhead associated with them:
foo(10, 20, 30, 40)
: Require 4 additional commands to load each value.foo(*[10, 20, 30, 40])
: 4 LOAD_CONST
commands and an additional BUILD_LIST
command.
foo(*l)
cuts down execution a bit since we provide an already built list containing the values.foo(**{'a':10, 'b':20, 'c': 30, 'd': 40})
: 8 LOAD_CONST
commands and a BUILD_MAP
.
foo(**d)
will cut down execution because a built list will bu supplied.All in all the ordering for the execution times of different cases of calls are:
defaults < positionals < keyword arguments < list unpacking < dictionary unpacking
I suggest using dis.dis
on these cases and seeing their differences.
As @goofd pointed out in a comment, this is really something one should not worry about, it really does depend on the use case. If you frequently call 'light' functions from a computation standpoint, specifying defaults will produce a slight boost in speed. If you frequently supply different values this produces next to nothing.
So, it's probably negligible and trying to get boosts from obscure edge cases like this is really pushing it. If you find yourself doing this, you might want to look at things like PyPy
and Cython
.
Upvotes: 17