Clip
Clip

Reputation: 3078

Python Get top values in dictionary

I have a dictionary called wordCounts which maps a word to how many times it occurred, how can I get the top n words in the dict while allowing more than n if there is a tie?

Upvotes: 1

Views: 3667

Answers (3)

Patrick Haugh
Patrick Haugh

Reputation: 61032

MooingRawr is on the right track, but now we need to just get the top n results

l = []
for i, (word, count) in enumerate(sorted(d.items(), reverse=True, key=lambda x: x[1])):  
    if i >= n and count<l[-1][1]:
        break
    l.append((word, count))

Upvotes: 1

Nf4r
Nf4r

Reputation: 1410

One solution could be:

from collections import Counter, defaultdict


list_of_words = ['dog', 'cat', 'moo', 'dog', 'pun', 'pun']
def get_n_most_common(n, list_of_words):
    ct = Counter(list_of_words)
    d = defaultdict(list)
    for word, quantity in ct.items():
        d[quantity].append(word)
    most_common = sorted(d.keys(), reverse= True)
    return [(word, val) for val in most_common[:n] for word in d[val]]

And the tests:

 >> get_n_most_common(2, list_of_words)
 => [('pun', 2), ('dog', 2), ('moo', 1), ('cat', 1)]
 >> get_n_most_common(1, list_of_words)
 => [('pun', 2), ('dog', 2)]

Upvotes: 1

brianpck
brianpck

Reputation: 8254

As the previous answer says, you can cast as a Counter to make this dataset easier to deal with.

>>> from collections import Counter
>>> d = {"d":1,"c":2,"a":3,'b':3,'e':0,'f':1}
>>> c = Counter(d)
>>> c
Counter({'b': 3, 'a': 3, 'c': 2, 'f': 1, 'd': 1, 'e': 0})

Counter has a most_common(n) method that will take the n most common elements. Note that it will exclude ties. Therefore:

>>> c.most_common(4)
[('b', 3), ('a', 3), ('c', 2), ('f', 1)]

To include all values equal to the nth element, you can do something like the following, without converting to a Counter. This is pretty messy, but it should do the trick.

from collections import Counter

def most_common_inclusive(freq_dict, n):
    # find the nth most common value
    nth_most_common = sorted(c.values(), reverse=True)[n-1]
    return { k: v for k, v in c.items() if v >= nth_most_common }

You can use as follows:

>>> d = {'b': 3, 'a': 3, 'c': 2, 'f': 1, 'd': 1, 'e': 0}
>>> most_common_inclusive(d, 4)
{'d': 1, 'b': 3, 'c': 2, 'f': 1, 'a': 3}

Upvotes: 3

Related Questions