Reputation: 651
Let's assume this example here with two classes:
from timeit import Timer
from random import randint
import numpy as np
# Python version 3.8.3
class Value:
def __init__(self, val):
self.val = val
class Object:
def __init__(self):
self.d = dict()
self.counter = 0
def add_obj(self, obj):
self.d[self.counter] = obj
self.counter +=1
def return_val(self):
return self.d
obj = Object()
for _ in range(100000):
name = chr(np.random.randint(ord('a'), ord('z')))
info = Value(name)
obj.add_obj(info)
mydict = obj.return_val()
Now, I simply want to use next() to search and retrieve the first occurrence of an element in the dict or return None, otherwise. However, I ma observing that both f1 and f3 functions are significantly slower that the traditional f2 approach. Any tips for performing this search in a more efficient way is appreciated. For the f3 funct, I used the suggestion provided here: Is next() in python really that fast?
def f1():
matched_elem = next((elem for elem in mydict.values() if str(elem.val) == 'a'), None)
return matched_elem
def f2():
for elem in mydict.values():
if 'a' == str(elem.val):
return elem
return None
def f3():
matched_elem = (elem for elem in mydict.values() if str(elem.val) == 'a')
if matched_elem is None:
return None
return next(matched_elem)
print(Timer(f1).timeit())
print(Timer(f2).timeit())
print(Timer(f3).timeit())
0.6597451590000001
0.397709414
0.682832415
Upvotes: 0
Views: 117
Reputation: 77377
Originally the problem was that f2
always exited after the first comparison... the else:
clause was in the wrong place. With the edit, You are seeing the extra overhead needed to create the generator object, call next and get the return. Its a pretty trivial difference that mostly goes away the longer the search for a matching value takes. timeit
is calling the function a million times so your print is showing average microseconds.
I ran the code in a loop 10 times and included a count for how far in the first "a" was, and these are the results. Notice that the time difference nearly disappears as the number of iterations needed to find the "a" increases. And the difference holds steady - that's the generator overhead.
10 t1: 1.909 μs, t2: 1.599 μs, diff: 0.310 pct: 83
2 t1: 0.887 μs, t2: 0.574 μs, diff: 0.313 pct: 64
30 t1: 4.510 μs, t2: 4.241 μs, diff: 0.269 pct: 94
8 t1: 1.703 μs, t2: 1.389 μs, diff: 0.314 pct: 81
5 t1: 1.314 μs, t2: 0.992 μs, diff: 0.322 pct: 75
45 t1: 6.670 μs, t2: 6.450 μs, diff: 0.219 pct: 96
86 t1: 12.689 μs, t2: 12.517 μs, diff: 0.172 pct: 98
9 t1: 1.946 μs, t2: 1.592 μs, diff: 0.354 pct: 81
25 t1: 4.210 μs, t2: 3.873 μs, diff: 0.337 pct: 91
59 t1: 9.064 μs, t2: 8.627 μs, diff: 0.437 pct: 95
The code
from timeit import Timer, timeit
from random import randint
import numpy as np
class Value:
def __init__(self, val):
self.val = val
class Object:
def __init__(self):
self.d = dict()
self.counter = 0
def add_obj(self, obj):
self.d[self.counter] = obj
self.counter +=1
def return_val(self):
return self.d
def f1():
matched_elem = next((elem for elem in mydict.values() if str(elem.val) == 'a'), None)
return matched_elem
def f2():
for elem in mydict.values():
if 'a' == str(elem.val):
return elem
return None
def countit():
for i, elem in enumerate(mydict.values()):
if str(elem.val) == 'a':
return i
for _ in range(10):
obj = Object()
for _ in range(99999):
name = chr(np.random.randint(ord('a'), ord('z')))
info = Value(name)
obj.add_obj(info)
mydict = obj.return_val()
c = countit()
t1 = Timer(f1).timeit()
t2 = Timer(f2).timeit()
print(f"{c:< 4} t1: {t1:2.3f} μs, t2: {t2:2.3f} μs, diff: {t1-t2:2.3f} pct: {int(t2/t1*100)}")
Upvotes: 2