defaultdict slower in comparison to normal dictionary

Question

I have a very simple code and was testing it with normal dictonaries as well as defaultdict and surprisingly defaultdict is slower when compared with normal dictionary.

from collections import defaultdict
from timeit import timeit

text = "hello this is python python is a great language, hello again"
d = defaultdict(int)
s = {}


def defdict():
    global text, d
    for word in text.split():
        d[word] += 1


def nordict():
    global text, s
    for word in text.split():
        if word not in s:
            s[word] = 1
        else:
            s[word] += 1

print(timeit(stmt='defdict', setup='from __main__ import defdict', number=3))
print(timeit(stmt='nordict', setup='from __main__ import nordict', number=3))

st = time.time()
defdict()
print(time.time() - st)

st = time.time()
nordict()
print(time.time() - st)

Output

5.799811333417892e-07
3.5099219530820847e-07
6.198883056640625e-06
3.0994415283203125e-06

This is a very simple example and for this particular case i can surely use Counter which would fastest of all, but I am looking at it from overall perspective for cases where we need to do stuff just more than counting the occurrences of a key and where we obviously cannot use Counter.

So why I am seeing is this behavior, am i missing something here or doing something in a wrong way ?

jpp · Accepted Answer

Your test is flawed because of the small size of the string. Thus fixed costs can outweigh the performance of your iteration logic. A good hint is your timings are measured in microseconds, negligible for benchmarking purposes.

Here's a more reasonable test:

n = 10**5
text = "hello this is python python is a great language, hello again"*n

%timeit defdict()  # 445 ms per loop
%timeit nordict()  # 520 ms per loop

defaultdict slower in comparison to normal dictionary

Answers (1)

Related Questions