removing least common elements from Counter

Question

Is there any "faster way" to remove key, value pairs from Counter where value is less than certain value?

I've done the following:

counter_dict = {k:v for k, v in counter_dict.items() if v > 5}

Anshul Goyal · Accepted Answer

The major issue with the current code is the call to .items, which will create a list of all items:

One optimization could be to use Counter.iteritems instead of .items, to save the penalty of creating a list and iterating through it again.

>>> from collections import Counter
>>> cnt = Counter("asbdasdbasdbadaasasdasadsa")
>>> {k:v for k,v in cnt.iteritems() if v > 5}
{'a': 10, 's': 7, 'd': 6}

Another optimization could be to not call the .items method, and instead iterate on the keys and access the values using the key:

>>> from collections import Counter
>>> cnt = Counter("asbdasdbasdbadaasasdasadsa")
>>> {k:cnt[k] for k in cnt if cnt[k] > 5}
{'a': 10, 's': 7, 'd': 6}

If we try to measure the difference with %timeit in ipython, using a sample Counter with your mentioned if condition, iteritems wins hands down:

In [1]: import random

In [2]: from collections import Counter

In [3]: MILLION = 10**6

In [4]: cnt = Counter(random.randint(0, MILLION) for _ in xrange(MILLION))

In [5]: %timeit {k:v for k, v in cnt.iteritems() if v < 5}
10 loops, best of 3: 140 ms per loop

In [6]: %timeit {k:v for k, v in cnt.items() if v**2 < 5}
1 loops, best of 3: 290 ms per loop

In [7]: %timeit {k:cnt[k] for k in cnt if cnt[k] < 5}
1 loops, best of 3: 272 ms per loop

With change of conditions:

In [8]: %timeit {k:v for k, v in cnt.iteritems() if v > 5}
10 loops, best of 3: 87 ms per loop

In [9]: %timeit {k:v for k, v in cnt.items() if v > 5}
1 loops, best of 3: 186 ms per loop

In [10]: %timeit {k:cnt[k] for k in cnt if cnt[k] > 5}
10 loops, best of 3: 153 ms per loop

removing least common elements from Counter

Answers (2)

Related Questions