Reputation: 87
d=[[(u'BAKING', 51)], [(u'ACCESS', 4)],[(u'CUTE', 2)], [(u'RED', 3)],[(u'FINE', 59)], [(u'ACCESS', 49)],[(u'YOU', 97)], [(u'THANK', 41)]]
I have a list of tuples with words and their corresponding frequency. Now how to find top 3 frequency words from these?
t=[]
for items in d:
k=items[0]
print len(k)
for j in k:
t.append(j)
print t
m=[t[i:i+2] for i in range(0, len(t), 2)]
print m
j=Counter(m)
This is giving me error, m is list it should be dictionary :( How to convert it into dictionary
Upvotes: 2
Views: 272
Reputation: 26017
You can use itemgetter
and itertools.chain
to get this task done:
from operator import itemgetter
from itertools import chain
sorted(list(chain.from_iterable(d)), key=itemgetter(1), reverse=True)[0:3]
This will give you:
[(u'YOU', 97), (u'FINE', 59), (u'BAKING', 51)]
Some explanation: The chain
command flattens your list of lists, so that you end up with a list of tuples (these might be easier to handle than the list of tuples). This list is then sorted according to the second element of the tuple using itemgetter
and you then select the first three elements.
EDIT:
Just read your comment about multiple entries. One way to do it is the following:
import collections
from operator import itemgetter
from itertools import chain
result_dict = collections.defaultdict(list)
newL = list(chain.from_iterable(d))
for tu in newL:
result_dict[tu[0]].append(tu[1])
This will give you
defaultdict(<type 'list'>, {u'CUTE': [2], u'BAKING': [51], u'THANK': [41], u'ACCESS': [4, 49], u'YOU': [97], u'FINE': [59], u'RED': [3]})
Now you can sum get the sum of the entries in the list like this:
res = {k: sum(v) for k,v in result_dict.iteritems()}
and the best three items like that:
sorted(res.iteritems(), key=itemgetter(1), reverse=True)[0:3]
In this case it is:
[(u'YOU', 97), (u'FINE', 59), (u'ACCESS', 53)]
Upvotes: 2
Reputation: 160
I prefer:
sorted(d, key = lambda x: x[0][1], reverse = True)
Upvotes: 2