Reputation: 15
I am working on a function
def common_words(dictionary, N):
if len(dictionary) > N:
max(dictionary, key=dictionary.get)
Description of the function is:
The first parameter is the dictionary of word counts and the second is a positive integer N. This function should update the dictionary so that it includes the most common (highest frequency words). At most N words should be included in the dictionary. If including all words with some word count would result in a dictionary with more than N words, then none of the words with that word count should be included. (i.e., in the case of a tie for the N+1st most common word, omit all of the words in the tie.)
So I know that I need to get the N items with the highest values but I am not sure how to do that. I also know that once I get N items that if there are any duplicate values that I need to pop them out.
For example, given
k = {'a':5, 'b':4, 'c':4, 'd':1}
then
common_words(k, 2)
should modify k
so that it becomes {'a':5}
.
Upvotes: 0
Views: 180
Reputation: 3555
my algorithm as below
OrderedDict
here if wanna retain the items orderit will just return the dictionary as it is if the dictionary length is less than N
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
return dict(tmp[:N])
# return [i[0] for i in tmp[:N]] # comment line above and uncomment this line to get keys only as your title mention how to get keys
else:
return dictionary
# return dictionary.keys() # comment line above and uncomment this line to get keys only as your title mention how to get keys
>>> common_words({'a':5, 'b':4, 'c':4, 'd':1}, 2)
{'a': 5}
OP wanna modify input dictionary within function and return None
, it can be modified as below
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
# return dict(tmp[:N])
for i in tmp[N:]:
dictionary.pop(i[0])
>>> k = {'a':5, 'b':4, 'c':4, 'd':1}
>>> common_words(k, 2)
>>> k
{'a': 5}
Upvotes: 0
Reputation: 55469
Here's my algorithm for this problem.
The grouping operation can be easily done using the standard itertools.groupby
function.
To perform the sorting and grouping we need an appropriate key function, as described in the groupby
, list
and sorted
docs. Since we need the second item of each tuple we could use
def keyfunc(t):
return t[1]
or
keyfunc = lambda t: t[1]
but it's more efficient to use operator.itemgetter
.
from operator import itemgetter
from itertools import groupby
def common_words(d, n):
keyfunc = itemgetter(1)
lst = sorted(d.items(), key=keyfunc, reverse=True)
d.clear()
for _, g in groupby(lst, key=keyfunc):
g = list(g)
if len(d) + len(g) <= n:
d.update(g)
else:
break
# test
data = {'a':5, 'b':4, 'c':4, 'd':1}
common_words(data, 4)
print(data)
common_words(data, 2)
print(data)
output
{'c': 4, 'd': 1, 'b': 4, 'a': 5}
{'a': 5}
Upvotes: 2