Reputation: 2698
I have just started learning python and worried that if I use dict.get(key,default_value) or I define my own method for it....so do they have any differences:
[1st method]:
dict={}
for c in string:
if c in dict:
dict[c]+=1
else:
dict[c]=1
and the other dict.get() method that python provides
for c in string:
dict[c]=dict.get(c,0)+1
do they have any differences on efficiency or speed...or they are just the same and 2nd one only saves writing few more lines of code...
Upvotes: 1
Views: 1533
Reputation: 1121484
For this specific case, use either a collections.Counter()
or a collections.defaultdict()
object instead:
import collections
dct = collections.defaultdict(int)
for c in string:
dict[c] += 1
or
dct = collections.Counter(string)
Both are subclasses of the standard dict
type. The Counter
type adds some more helpful functionality like summing two counters or listing the most common entities that have been counted. The defaultdict
class can also be given other default types; use defaultdict(list)
for example to collect things into lists per key.
When you want to compare performance of two different approaches, you want to use the timeit
module:
>>> import timeit
>>> def intest(dct, values):
... for c in values:
... if c in dct:
... dct[c]+=1
... else:
... dct[c]=1
...
>>> def get(dct, values):
... for c in values:
... dct[c] = dct.get(c, 0) + 1
...
>>> values = range(10) * 10
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, intest as test; dct={}')
22.210275888442993
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, get as test; dct={}')
27.442166090011597
This shows that using in
is a little faster.
There is, however, a third option to consider; catching the KeyError
exception:
>>> def tryexcept(dct, values):
... for c in values:
... try:
... dct[c] += 1
... except KeyError:
... dct[c] = 1
...
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, tryexcept as test; dct={}')
18.023509979248047
which happens to be the fastest, because only 1 in 10 cases are for a new key.
Last but not least, the two alternatives I proposed:
>>> def default(dct, values):
... for c in values:
... dct[c] += 1
...
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, default as test; from collections import defaultdict; dct=defaultdict(int)')
15.277361154556274
>>> timeit.timeit('Counter(values)', 'from __main__ import values; from collections import Counter')
38.657804012298584
So the Counter()
type is slowest, but defaultdict
is very fast indeed. Counter()
s do a lot more work though, and the extra functionality can bring ease of development and execution speed benefits elsewhere.
Upvotes: 6