Why does a dictionary count in some cases faster than collections.Counter?

Question

I needed a solution for extracting unique elements from a non-unique list along with counting its duplicate elements.

The purpose of the solution was to use it in an algorithm for creating unique combinations from a non-unique list. The list sizes from which combinations are created in such case are usually very small (less than 50 elements), but my goal was to find the overall fastest code trying to optimize whenever and wherever possible (even if gaining only very tiny amount of running time).

Pythons collections module provides an exactly for such purpose suited specialized collections.Counter, but there are apparently cases in which usage of a simple dictionary instead of collections.Counter leads to faster code like you can check out yourself using the code below:

from time   import time   as t
from timeit import timeit as tt

from collections import Counter
def counter(iterable):
    dctCounter = {}
    for item in iterable:
        if item in dctCounter: 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
    return dctCounter

for n, N in [(1,10), (10,1), (1,50), (50,1), (1,100), (100,1), (1,200), (200, 1), (1, 500), (500, 1), (1, 1000), (1000,1)]:
    lstItems = n*list(range(N))
    for noLoops in [10**p for p in range(5, 6)]: 
        s = t()
        for _ in range(noLoops): 
            dctCounter = counter(lstItems)
        e = t()
        timeDctFctn = e - s
        s = t()
        for _ in range(noLoops): 
            objCounter = Counter(lstItems)
        e = t()
        timeCollCtr = e - s
        timeitCollCtr = tt("objCounter=Counter(lstItems)", "from __main__ import Counter, lstItems", number=noLoops)
        timeitDctFctn = tt("dctCounter=counter(lstItems)", "from __main__ import counter, lstItems", number=noLoops)
        # print("Loops: {:7}, time/timeit CollCtr: {:7.5f}/{:7.5f} DctFctn: {:7.5f}/{:7.5f} sec. lstSize: {:3}, %uniq: {:3.0f}".format(noLoops, timeCollCtr, timeitCollCtr, timeDctFctn, timeitDctFctn, n*N, 100.0/n))
        print("collections.Counter(): {:7.5f},  def counter(): {:7.5f} sec. lstSize: {:3}, %uniq: {:3.0f}, ({} timitLoops)".format(timeitCollCtr, timeitDctFctn, n*N, 100.0/n, noLoops))    
        # print('-----------------------------------------------------------------------------------------------------------')

Here the output:

python3.6 -u "collections.Counter-vs-dictionaryAsCounter_Cg.py"
collections.Counter(): 0.36461, def counter(): 0.09592 sec. lstSize:  10, %uniq: 100, (100000 timitLoops)
collections.Counter(): 0.36444, def counter(): 0.12286 sec. lstSize:  10, %uniq:  10, (100000 timitLoops)
collections.Counter(): 0.58627, def counter(): 0.43233 sec. lstSize:  50, %uniq: 100, (100000 timitLoops)
collections.Counter(): 0.52399, def counter(): 0.54106 sec. lstSize:  50, %uniq:   2, (100000 timitLoops)
collections.Counter(): 0.82332, def counter(): 0.81436 sec. lstSize: 100, %uniq: 100, (100000 timitLoops)
collections.Counter(): 0.72513, def counter(): 1.06823 sec. lstSize: 100, %uniq:   1, (100000 timitLoops)
collections.Counter(): 1.27130, def counter(): 1.59476 sec. lstSize: 200, %uniq: 100, (100000 timitLoops)
collections.Counter(): 1.13817, def counter(): 2.14566 sec. lstSize: 200, %uniq:   0, (100000 timitLoops)
collections.Counter(): 3.16287, def counter(): 4.26738 sec. lstSize: 500, %uniq: 100, (100000 timitLoops)
collections.Counter(): 2.64247, def counter(): 5.67448 sec. lstSize: 500, %uniq:   0, (100000 timitLoops)
collections.Counter(): 4.89153, def counter(): 7.68661 sec. lstSize:1000, %uniq: 100, (100000 timitLoops)
collections.Counter(): 6.06389, def counter():13.92613 sec. lstSize:1000, %uniq:   0, (100000 timitLoops)
>Exit code: 0

P.S. It seems that collections.Counter() fails to be up to the expectations also in another context as this above. See here: https://stackoverflow.com/questions/41594940/why-is-collections-counter-so-slow

MSeifert · Accepted Answer

Counter has one major bottleneck when you count short iterables: It checks if isinstance(iterable, Mapping). This test is rather slow because collections.abc.Mapping is an abstract metaclass and as such the isinstance-check is a bit more complicated than normal isinstance-checks, see for example: "Why is checking isinstance(something, Mapping) so slow?"

So it isn't really surprising if other approaches are faster for short iterables. However for long iterables the check doesn't matter much and Counter should be faster (at least for python-3.x (CPython) where the actual counting function _count_elements is written in c).

An easy way to identify bottlenecks is profiling. I'll use line_profiler here:

%load_ext line_profiler

from collections import Counter
x = range(50)
# Profile the function Counter.update when executing the command "Counter(x)"
%lprun -f Counter.update Counter(x)

The result:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   604         1            8      8.0      3.9          if not args:
   605                                                       raise TypeError("descriptor 'update' of 'Counter' object "
   606                                                                       "needs an argument")
   607         1           13     13.0      6.4          self, *args = args
   608         1            6      6.0      3.0          if len(args) > 1:
   609                                                       raise TypeError('expected at most 1 arguments, got %d' % len(args))
   610         1            5      5.0      2.5          iterable = args[0] if args else None
   611         1            3      3.0      1.5          if iterable is not None:
   612         1           94     94.0     46.3              if isinstance(iterable, Mapping):
   613                                                           if self:
   614                                                               self_get = self.get
   615                                                               for elem, count in iterable.items():
   616                                                                   self[elem] = count + self_get(elem, 0)
   617                                                           else:
   618                                                               super(Counter, self).update(iterable) # fast path when counter is empty
   619                                                       else:
   620         1           69     69.0     34.0                  _count_elements(self, iterable)
   621         1            5      5.0      2.5          if kwds:
   622                                                       self.update(kwds)

So the time it takes to initialize a Counter from an iterable has a rather big constant factor (46% of the time are spent on the isinstance check, getting the dictionary with the counts takes only 34%).

However for long iterables it doesn't matter (much) because it's just done once:

%lprun -f Counter.update Counter([1]*100000)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   604         1           12     12.0      0.0          if not args:
   605                                                       raise TypeError("descriptor 'update' of 'Counter' object "
   606                                                                       "needs an argument")
   607         1           12     12.0      0.0          self, *args = args
   608         1            6      6.0      0.0          if len(args) > 1:
   609                                                       raise TypeError('expected at most 1 arguments, got %d' % len(args))
   610         1            6      6.0      0.0          iterable = args[0] if args else None
   611         1            3      3.0      0.0          if iterable is not None:
   612         1           97     97.0      0.3              if isinstance(iterable, Mapping):
   613                                                           if self:
   614                                                               self_get = self.get
   615                                                               for elem, count in iterable.items():
   616                                                                   self[elem] = count + self_get(elem, 0)
   617                                                           else:
   618                                                               super(Counter, self).update(iterable) # fast path when counter is empty
   619                                                       else:
   620         1        28114  28114.0     99.5                  _count_elements(self, iterable)
   621         1           13     13.0      0.0          if kwds:
   622                                                       self.update(kwds)

Just to give you an overview how these perform depending on the number of elements, for comparison I included an optimized version of your count and the _count_elements function that is used by Counter. However I excluded the part where you sorted the items and created a list of the counts to avoid other effects - especially because sorted has a different run-time behavior (O(n log(n))) than counting (O(n)):

# Setup

import random
from collections import Counter, _count_elements

def count(iterable):
    """Explicit iteration over items."""
    dctCounter = {}
    for item in iterable:
        if item in dctCounter: 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
    return dctCounter

def count2(iterable):
    """Iterating over the indices"""
    dctCounter = {}
    lenLstItems = len(iterable)
    for idx in range(lenLstItems):
        item = iterable[idx]
        if item in dctCounter.keys(): 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
    return dctCounter

def c_count(iterable):
    """Internal counting function that's used by Counter"""
    d = {}
    _count_elements(d, iterable)
    return d

# Timing

timings = {Counter: [], count: [], count2: [], c_count: []}

for i in range(1, 20):
    print(2**i)
    it = [random.randint(0, 2**i) for _ in range(2**i)]
    for func in (Counter, count, count2, c_count):
        res = %timeit -o func(it)
        timings[func].append(res)

# Plotting

%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

n = 2**np.arange(1, 5)

ax.plot(n, 
        [time.average for time in timings[count]], 
        label='my custom function', c='red')
ax.plot(n, 
        [time.average for time in timings[count2]], 
        label='your custom function', c='green')
ax.plot(n, 
        [time.average for time in timings[Counter]], 
        label='Counter', c='blue')
ax.plot(n, 
        [time.average for time in timings[c_count]], 
        label='_count_elements', c='purple')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('elements')
ax.set_ylabel('time to count them [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()

And the result:

Individual timings:

2
30.5 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.67 µs ± 3.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
6.03 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.67 µs ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
4
30.7 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2.63 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.81 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.97 µs ± 5.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
8
34.3 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
4.3 µs ± 16.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11.3 µs ± 23.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
16
34.2 µs ± 599 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.46 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
17.5 µs ± 83.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.24 µs ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
32
38.4 µs ± 578 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
13.7 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.8 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.56 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
64
43.5 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
24 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
52.8 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
11.6 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
128
53.5 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
47.8 µs ± 507 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
101 µs ± 3.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.7 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
256
69.6 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
92.1 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
188 µs ± 3.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
39.5 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
512
123 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
200 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
409 µs ± 3.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
90.9 µs ± 958 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1024
230 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
428 µs ± 5.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
855 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
193 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2048
436 µs ± 7.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
868 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.76 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
386 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4096
830 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.8 ms ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.75 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.06 ms ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8192
2.3 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.8 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.8 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.69 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
16384
4.53 ms ± 349 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.22 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.9 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.9 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32768
9.6 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.2 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32.5 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.4 ms ± 687 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
65536
24.8 ms ± 490 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
40.1 ms ± 710 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
66.8 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24.5 ms ± 659 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
131072
54.6 ms ± 756 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
84.2 ms ± 728 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
142 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.1 ms ± 424 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
262144
120 ms ± 4.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
296 ms ± 3.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
117 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
524288
244 ms ± 3.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
368 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
601 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
252 ms ± 6.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Why does a dictionary count in some cases faster than collections.Counter?

Answers (1)

Related Questions