Reputation: 4168
I have a very simple code and was testing it with normal dictonaries as well as defaultdict
and surprisingly defaultdict
is slower when compared with normal dictionary.
from collections import defaultdict
from timeit import timeit
text = "hello this is python python is a great language, hello again"
d = defaultdict(int)
s = {}
def defdict():
global text, d
for word in text.split():
d[word] += 1
def nordict():
global text, s
for word in text.split():
if word not in s:
s[word] = 1
else:
s[word] += 1
print(timeit(stmt='defdict', setup='from __main__ import defdict', number=3))
print(timeit(stmt='nordict', setup='from __main__ import nordict', number=3))
st = time.time()
defdict()
print(time.time() - st)
st = time.time()
nordict()
print(time.time() - st)
Output
5.799811333417892e-07
3.5099219530820847e-07
6.198883056640625e-06
3.0994415283203125e-06
This is a very simple example and for this particular case i can surely use Counter
which would fastest of all, but I am looking at it from overall perspective for cases where we need to do stuff just more than counting the occurrences of a key and where we obviously cannot use Counter
.
So why I am seeing is this behavior, am i missing something here or doing something in a wrong way ?
Upvotes: 1
Views: 2031
Reputation: 164693
Your test is flawed because of the small size of the string. Thus fixed costs can outweigh the performance of your iteration logic. A good hint is your timings are measured in microseconds, negligible for benchmarking purposes.
Here's a more reasonable test:
n = 10**5
text = "hello this is python python is a great language, hello again"*n
%timeit defdict() # 445 ms per loop
%timeit nordict() # 520 ms per loop
Upvotes: 4