xor
xor

Reputation: 2698

Efficiency of dict.get() method

I have just started learning python and worried that if I use dict.get(key,default_value) or I define my own method for it....so do they have any differences:

[1st method]:

dict={}
for c in string:
    if c in dict:
        dict[c]+=1
    else:
        dict[c]=1

and the other dict.get() method that python provides

for c in string:
    dict[c]=dict.get(c,0)+1

do they have any differences on efficiency or speed...or they are just the same and 2nd one only saves writing few more lines of code...

Upvotes: 1

Views: 1533

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121484

For this specific case, use either a collections.Counter() or a collections.defaultdict() object instead:

import collections

dct = collections.defaultdict(int)

for c in string:
     dict[c] += 1

or

dct = collections.Counter(string)

Both are subclasses of the standard dict type. The Counter type adds some more helpful functionality like summing two counters or listing the most common entities that have been counted. The defaultdict class can also be given other default types; use defaultdict(list) for example to collect things into lists per key.

When you want to compare performance of two different approaches, you want to use the timeit module:

>>> import timeit
>>> def intest(dct, values):
...     for c in values:
...         if c in dct:
...             dct[c]+=1
...         else:
...             dct[c]=1
... 
>>> def get(dct, values):
...     for c in values:
...         dct[c] = dct.get(c, 0) + 1
... 
>>> values = range(10) * 10
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, intest as test; dct={}')
22.210275888442993
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, get as test; dct={}')
27.442166090011597

This shows that using in is a little faster.

There is, however, a third option to consider; catching the KeyError exception:

>>> def tryexcept(dct, values):
...     for c in values:
...         try:
...             dct[c] += 1
...         except KeyError:
...             dct[c] = 1
... 
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, tryexcept as test; dct={}')
18.023509979248047

which happens to be the fastest, because only 1 in 10 cases are for a new key.

Last but not least, the two alternatives I proposed:

>>> def default(dct, values):
...     for c in values:
...         dct[c] += 1
... 
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, default as test; from collections import defaultdict; dct=defaultdict(int)')
15.277361154556274
>>> timeit.timeit('Counter(values)', 'from __main__ import values; from collections import Counter')
38.657804012298584

So the Counter() type is slowest, but defaultdict is very fast indeed. Counter()s do a lot more work though, and the extra functionality can bring ease of development and execution speed benefits elsewhere.

Upvotes: 6

Related Questions