Enes Dal
Enes Dal

Reputation: 15

How to get a set of keys with largest values?

I am working on a function

def common_words(dictionary, N):
     if len(dictionary) > N:
         max(dictionary, key=dictionary.get)

Description of the function is:

The first parameter is the dictionary of word counts and the second is a positive integer N. This function should update the dictionary so that it includes the most common (highest frequency words). At most N words should be included in the dictionary. If including all words with some word count would result in a dictionary with more than N words, then none of the words with that word count should be included. (i.e., in the case of a tie for the N+1st most common word, omit all of the words in the tie.)

So I know that I need to get the N items with the highest values but I am not sure how to do that. I also know that once I get N items that if there are any duplicate values that I need to pop them out.


For example, given

k = {'a':5, 'b':4, 'c':4, 'd':1} 

then

common_words(k, 2) 

should modify k so that it becomes {'a':5}.

Upvotes: 0

Views: 180

Answers (2)

Skycc
Skycc

Reputation: 3555

my algorithm as below

  1. 1st build tuple list from dictionary sorted based on value from largest to smallest
  2. check for if item[N-1] match item[N] value, if yes, drop item[N-1] (index start from 0, so -1 there)
  3. finally, convert the slice of tuple list up to N element back to dict, may change to use OrderedDict here if wanna retain the items order

it will just return the dictionary as it is if the dictionary length is less than N

def common_words(dictionary, N):
    if len(dictionary) > N:
        tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
        if tmp[N-1][1] == tmp[N][1]:
            N -= 1
        return dict(tmp[:N])
        # return [i[0] for i in tmp[:N]] # comment line above and uncomment this line to get keys only as your title mention how to get keys
    else:
        return dictionary
        # return dictionary.keys() # comment line above and uncomment this line to get keys only as your title mention how to get keys

>>> common_words({'a':5, 'b':4, 'c':4, 'd':1}, 2)
{'a': 5}

OP wanna modify input dictionary within function and return None, it can be modified as below

def common_words(dictionary, N):
    if len(dictionary) > N:
        tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
        if tmp[N-1][1] == tmp[N][1]:
            N -= 1
        # return dict(tmp[:N])
        for i in tmp[N:]:
            dictionary.pop(i[0])

>>> k = {'a':5, 'b':4, 'c':4, 'd':1}
>>> common_words(k, 2)
>>> k
{'a': 5}

Upvotes: 0

PM 2Ring
PM 2Ring

Reputation: 55469

Here's my algorithm for this problem.

  1. Extract the data from the dictionary into a list and sort it in descending order on the dictionary values.
  2. Clear the original dictionary.
  3. Group the sorted data into groups that have the same value.
  4. Re-populate the dictionary with the all (key, value) pairs from each group in the sorted list if that will keep the total dictionary size <= N. If adding a group would make the total dictionary size > N, then return.

The grouping operation can be easily done using the standard itertools.groupby function.

To perform the sorting and grouping we need an appropriate key function, as described in the groupby, list and sorted docs. Since we need the second item of each tuple we could use

def keyfunc(t):
    return t[1]

or

keyfunc = lambda t: t[1]

but it's more efficient to use operator.itemgetter.


from operator import itemgetter
from itertools import groupby

def common_words(d, n):
    keyfunc = itemgetter(1)
    lst = sorted(d.items(), key=keyfunc, reverse=True)
    d.clear()
    for _, g in groupby(lst, key=keyfunc):
        g = list(g)
        if len(d) + len(g) <= n:
            d.update(g)
        else:
            break

# test

data = {'a':5, 'b':4, 'c':4, 'd':1} 

common_words(data, 4)
print(data)
common_words(data, 2)
print(data)

output

{'c': 4, 'd': 1, 'b': 4, 'a': 5}
{'a': 5}

Upvotes: 2

Related Questions