Reputation: 587
In my code, I have a list l
, and I'm creating a dictionary of lists from it. (I'm grouping objects which have the same key
). Implementing it via both a try
statement and a if
condition, I've noticed in line_profiler that the former seems much more efficient :
Line # Hits Time Per Hit % Time Line Contents
==============================================================
293 44378450 59805020 1.3 16.9 for element in l:
# stuff that compute 'key' from 'element'
302 2234869 2235518 1.0 0.6 try:
303 2234869 82486133 36.9 23.3 d[key].append(element)
304 57358 72499 1.3 0.0 except KeyError:
305 57358 1758248 30.7 0.5 d[key] = [element]
vs:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
293 44378450 60185880 1.4 14.0 for element in l:
# stuff that compute 'key' from 'element'
307 2234869 81683512 36.5 19.1 if key in d.keys():
308 2177511 76393595 35.1 17.8 d.get(key).append(element)
309 else:
310 57358 1717679 29.9 0.4 d[key] = [element]
I understand that with try
, you only go into except
when an exception is raised (so with few exceptions, it makes sense it cost less overall than testing the condition every time), but here, even the Time per hit
is slower for the exception (1.3+30.7 µs) than for the test condition (36.5 µs). I thought raising exception would be more costly than checking if a key is in the dictionary (in
just test the hash key, no ? It's not a line search). So why is that ?
Upvotes: 4
Views: 4828
Reputation: 39
I learned in one of my classes that the processor tries to rpedict and load in memory the instructions. If you put a if statement, the processor does not know which code to load and there is a waste of time loaing/unloading instructions etc.
Probably, in case of an exception, the processor assum tat the exception is rare and loads the main code without considering the exception...
I don't know f it is the case here, but that is what we learned in code optimizartion class.
source(class https://www.etsmtl.ca/etudes/cours/gti320/)
Upvotes: 0
Reputation: 14379
The additional runtime comes from the .keys()
call. If you want to prevent that extra call and still stay with if
and else
, try something like this:
obj = d.get(key)
if obj:
obj.append(element)
else:
d[key] = [element]
Alternatively you can use a defaultdict
which does this in the background. Example:
from collections import defaultdict
d = defaultdict(list)
d['123'].append('abc')
Upvotes: 6
Reputation: 90979
try...except
is slower only if the number of exceptions actually raised is comparable to the number of times the loop is executed. In your case, exceptions are only raised 2.5% of loop iterations.
Lets analyse the four scenarios below -
def func1():
l = [1,2,3,4]
d = {}
for e in l:
k = e - 1
try:
d[k].append(e)
except KeyError:
d[k] = [e]
return d
def func2():
l = [1,2,3,4]
d = {}
for e in l:
k = e - 1
if k in d.keys():
d.get(k).append(e)
else:
d[k] = [e]
return d
def func3():
l = [1,2,3,4]
d = {}
for e in l:
k = 1
try:
d[k].append(e)
except KeyError:
d[k] = [e]
return d
def func4():
l = [1,2,3,4]
d = {}
for e in l:
k = 1
if k in d.keys():
d.get(k).append(e)
else:
d[k] = [e]
return d
The timing results for this -
In [7]: %timeit func1()
The slowest run took 4.17 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.55 µs per loop
In [8]: %timeit func2()
1000000 loops, best of 3: 1.77 µs per loop
In [10]: %timeit func3()
The slowest run took 4.34 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 2.01 µs per loop
In [11]: %timeit func4()
The slowest run took 6.83 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.4 µs per loop
In case of func1()
and func2()
, each element is going into a separate list, and hence for each key the try..except
block raises and catches an exception. At that case, func2()
is faster.
In case of func3()
and func4()
, the exception is only thrown once, so the overhead of exception only occurs once, whereas the condition is still checked for each key (even if its present) , and that is why in that case, the try..except
is faster.
I am guessing something similar may be occurring in your case, where the same key is computed multiple times, and hence its faster with the try..except
block. You can check out how many actual elements are there in the list vs how many keys are there in the dictionary to find out if that could be the reason.
Assuming the hits
column is the number of times that particular line was executed, you can see that , the line -
d[key].append(element)
Was executed 2234869 times, whereas exception was raised only - 57358 times , Which is only 2.56% of the total number of elements.
Upvotes: 4
Reputation: 351
You should think that at every iteration you loose some time testing if condition checks. If you don't raise an exception your code will be faster with try except, otherwise it takes a little bit more time to treat the exception.
In other word, if you are sure that your exception is extraodinary (it happens only in exceptional cases) it's cheaper for you to use try-except. Otherwise, if you except an exception in ~50% of cases it's better for you to use if.
("it's easier to ask for forgiveness than permission")
Upvotes: 2