Reputation: 2957
While running a numerical integrator, I noticed a noticeable difference in speed depending on how I extract the value of the field in a dictionary
import numpy as np
def bad_get(mydict):
'''Extract the name field using get()'''
output = mydict.get('name', None)
return output
def good_get(mydict):
'''Extract the name field using if-else'''
if 'name' in mydict:
output = mydict['name']
else:
output = None
return output
name_dict = dict()
name_dict['name'] = np.zeros((5000,5000))
On my system, I notice the following difference (using iPython)
%%timeit
bad_get(name_dict)
The slowest run took 7.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 247 ns per loop
Compared to
%%timeit
good_get(name_dict)
1000000 loops, best of 3: 188 ns per loop
This may seem like a small difference, but in for some arrays the difference appears to be even more dramatic. What causes this behavior, and is there some way I should alter my use of the get()
function?
Upvotes: 18
Views: 6617
Reputation: 1121484
Python has to do more work for dict.get()
:
get
is an attribute, so Python has to look this up, and then bind the descriptor found to the dictionary instance.()
is a call, so the current frame has to be pushed on the stack, a call has to be made, then the frame has to be popped again from the stack to continue.The [...]
notation, used with a dict
, doesn't require a separate attribute step or frame push and pop.
You can see the difference when you use the Python bytecode disassembler dis
:
>>> import dis
>>> dis.dis(compile('d[key]', '', 'eval'))
1 0 LOAD_NAME 0 (d)
3 LOAD_NAME 1 (key)
6 BINARY_SUBSCR
7 RETURN_VALUE
>>> dis.dis(compile('d.get(key)', '', 'eval'))
1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 CALL_FUNCTION 1
12 RETURN_VALUE
so the d[key]
expression only has to execute a BINARY_SUBSCR
opcode, while d.get(key)
adds a LOAD_ATTR
opcode. CALL_FUNCTION
is a lot more expensive than BINARY_SUBSCR
on a built-in type (custom types with __getitem__
methods still end up doing a function call).
If the majority of your keys exist in the dictionary, you could use try...except KeyError
to handle missing keys:
try:
return mydict['name']
except KeyError:
return None
Exception handling is cheap if there are no exceptions.
Upvotes: 72