Reputation: 2542
I have the following list of tuples made up of Category,Tag,TagCount. They are ordered by the Category and TagCount.
[(u'Agriculture', u'Farming', 3L), (u'Agriculture', u'Business', 2L), (u'Agriculture', u'Animal', 2L), (u'Agriculture', u'Illness', 1L), (u'Agriculture', u'Health', 1L), (u'Agriculture', u'Disability', 1L),
(u'Agriculture', u'Carers', 1L), (u'Employment', u'Money', 1L), (u'Employment', u'Business', 1L),
(u'Employment', u'Tax', 1L), (u'Employment', u'Debt', 1L), (u'Employment', u'Budget', 1L),
(u'Environment', u'Business', 2L), (u'Environment', u'Animal', 2L), (u'Environment', u'Trees', 2L)]
I want to be able to get the top 3 tuples in each category. So i want to return
[(u'Agriculture', u'Farming', 3L), (u'Agriculture', u'Business', 2L), (u'Agriculture', u'Animal', 2L),
(u'Employment', u'Money', 1L), (u'Employment', u'Business', 1L), (u'Employment', u'Tax', 1L),
(u'Environment', u'Business', 2L), (u'Environment', u'Animal', 2L), (u'Environment', u'Trees', 2L)]
I know i can get this working with for loops and counters, but i feel there might be an easy way that i am completey missing using lambdas.
Here is what i have that works:
output = []
counter = 1
last_category = ''
for res in results:
category = res[0]
if category != last_category: counter = 1
if category == last_category:
if counter <= 3:
output.append(res)
counter +=1
last_category = category
Upvotes: 0
Views: 253
Reputation: 78554
You can group the items first and then slice the first 3 items from each group. :
from itertools import groupby, islice
from operator import itemgetter
f = itemgetter(0)
r = [i for _, g in groupby(lst, f) for i in islice(g, 3)]
pprint(r)
For a general case, if the items are not already sorted (by category and count), then you can do an initial sort using:
lst = sorted(lst, lambda x: (x[0], -x[2]))
This gives a sorting on the category and descending counts.
[(u'Agriculture', u'Farming', 3L),
(u'Agriculture', u'Business', 2L),
(u'Agriculture', u'Animal', 2L),
(u'Employment', u'Money', 1L),
(u'Employment', u'Business', 1L),
(u'Employment', u'Tax', 1L),
(u'Environment', u'Business', 2L),
(u'Environment', u'Animal', 2L),
(u'Environment', u'Trees', 2L)]
Upvotes: 2
Reputation: 12150
What you seem to need here is groupby()
.
from itertools import groupby
import pprint
l = [(u'Agriculture', u'Farming', 3L), (u'Agriculture', u'Business', 2L),
(u'Agriculture', u'Animal', 2L), (u'Agriculture', u'Illness', 1L),
(u'Agriculture', u'Health', 1L), (u'Agriculture', u'Disability', 1L),
(u'Agriculture', u'Carers', 1L), (u'Employment', u'Money', 1L),
(u'Employment', u'Business', 1L), (u'Employment', u'Tax', 1L),
(u'Employment', u'Debt', 1L), (u'Employment', u'Budget', 1L),
(u'Environment', u'Business', 2L), (u'Environment', u'Animal', 2L),
(u'Environment', u'Trees', 2L)]
pprint.pprint([sorted(x[1], key=(lambda x: -1*x[2]))[:3]
for x in groupby(l, lambda x: x[0])])
Which gives:
[[(u'Agriculture', u'Farming', 3L),
(u'Agriculture', u'Business', 2L),
(u'Agriculture', u'Animal', 2L)],
[(u'Employment', u'Money', 1L),
(u'Employment', u'Business', 1L),
(u'Employment', u'Tax', 1L)],
[(u'Environment', u'Business', 2L),
(u'Environment', u'Animal', 2L),
(u'Environment', u'Trees', 2L)]]
Upvotes: 0
Reputation: 4866
You can accomplish that using a list comprehension:
res = [y for y in a if y[2] in sorted([x[2] for x in a if x[0] == y[0]])[-3:]]
It assumes a
is your list of tuples.
Output:
[(u'Agriculture', u'Farming', 3L),
(u'Agriculture', u'Business', 2L),
(u'Agriculture', u'Animal', 2L),
(u'Employment', u'Money', 1L),
(u'Employment', u'Business', 1L),
(u'Employment', u'Tax', 1L),
(u'Employment', u'Debt', 1L),
(u'Employment', u'Budget', 1L),
(u'Environment', u'Business', 2L),
(u'Environment', u'Animal', 2L),
(u'Environment', u'Trees', 2L)]
Upvotes: 0