Michael B
Michael B

Reputation: 578

Counter.most_common(n) how to override arbitrary ordering

Can I accomplish a rank/sort using Counter.most_common() functionality, thus avoiding this line: d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False) ??

Challenge: You are given a string.The string contains only lowercase English alphabet characters.Your task is to find the top three most common characters in the string.

Output Format: Print the three most common characters along with their occurrence count each on a separate line. Sort output in descending order of occurrence count. If the occurrence count is the same, sort the characters in ascending order.

In completing this I used dict, Counter, and sort in order to ensure "the occurrence count is the same, sort the characters in ascending order". The in-built Python sorted functionality ensures ordering by count, then alphabetical. I'm curious if there is a way to override Counter.most_common() default arbitrary sort/order logic as it seems to disregard the lexicographical order of the results when picking the top 3.

import sys
from collections import Counter

string = sys.stdin.readline().strip()
d = dict(Counter(string).most_common(3))
d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False)

for letter, count in d[:3]:
    print letter, count

Upvotes: 6

Views: 2734

Answers (1)

smci
smci

Reputation: 33940

Yes the doc explicitly says Counter.most_common()'s (tie-breaker) order for when counts are equal is arbitrary.

  • UPDATE: PM2Ring told me Counter inherits dict's ordering. The insertion order thing only happens in 3.6+, and is only guaranteed in 3.7. It's possible the doc is lagging.
  • In cPython 3.6+ they fall back on original insertion order (see bottom), but don't rely on that implementation because per the spec, it's not defined behavior. Best to do your own sort, as you say, if you want totally deterministic behavior.
  • I show at bottom how you can monkey-patch Counter.most_common with your own sort function like you show, but that's frowned on. (Code you write might accidentally rely on it and hence break when it wasn't patched.)
  • You could subclass Counter to MyCounter so you can override its most_common. Painful and not really portable.
  • Really the best approach is just to write code and tests that don't rely on the arbitrary tiebreaker order from most_common()
  • I agree that most_common() should not have been hardwired and we should be able to pass a comparison key or sort function into __init__().

Monkey-patching Counter.most_common() :

def patched_most_common(self):
    return sorted(self.items(), key=lambda x: (-x[1],x[0]))

collections.Counter.most_common = patched_most_common

collections.Counter('ccbaab')
Counter({'a': 2, 'b': 2, 'c': 2})

Demonstrating that in cPython 3.7, the arbitrary order is order of insertion (first insertion of each character):

Counter('abccba').most_common()
[('a', 2), ('b', 2), ('c', 2)]

Counter('ccbaab').most_common()
[('c', 2), ('b', 2), ('a', 2)]

Upvotes: 7

Related Questions