Mark
Mark

Reputation: 2542

Get top 3 elements from a list of tuples

I have the following list of tuples made up of Category,Tag,TagCount. They are ordered by the Category and TagCount.

[(u'Agriculture', u'Farming', 3L), (u'Agriculture', u'Business', 2L), (u'Agriculture', u'Animal', 2L), (u'Agriculture', u'Illness', 1L), (u'Agriculture', u'Health', 1L), (u'Agriculture', u'Disability', 1L), 
(u'Agriculture', u'Carers', 1L), (u'Employment', u'Money', 1L), (u'Employment', u'Business', 1L), 
(u'Employment', u'Tax', 1L), (u'Employment', u'Debt', 1L), (u'Employment', u'Budget', 1L), 
(u'Environment', u'Business', 2L), (u'Environment', u'Animal', 2L), (u'Environment', u'Trees', 2L)]

I want to be able to get the top 3 tuples in each category. So i want to return

[(u'Agriculture', u'Farming', 3L), (u'Agriculture', u'Business', 2L), (u'Agriculture', u'Animal', 2L),
(u'Employment', u'Money', 1L), (u'Employment', u'Business', 1L), (u'Employment', u'Tax', 1L), 
(u'Environment', u'Business', 2L), (u'Environment', u'Animal', 2L), (u'Environment', u'Trees', 2L)]

I know i can get this working with for loops and counters, but i feel there might be an easy way that i am completey missing using lambdas.

Here is what i have that works:

output = []
counter = 1
last_category = ''
for res in results:
    category = res[0]
    if category != last_category: counter = 1
    if category == last_category:
        if counter <= 3:
            output.append(res)
            counter +=1

        last_category = category

Upvotes: 0

Views: 253

Answers (3)

Moses Koledoye
Moses Koledoye

Reputation: 78554

You can group the items first and then slice the first 3 items from each group. :

from itertools import groupby, islice
from operator import itemgetter

f = itemgetter(0)
r = [i for _, g in groupby(lst, f) for i in islice(g, 3)]
pprint(r)

For a general case, if the items are not already sorted (by category and count), then you can do an initial sort using:

lst = sorted(lst, lambda x: (x[0], -x[2]))

This gives a sorting on the category and descending counts.


[(u'Agriculture', u'Farming', 3L),
 (u'Agriculture', u'Business', 2L),
 (u'Agriculture', u'Animal', 2L),
 (u'Employment', u'Money', 1L),
 (u'Employment', u'Business', 1L),
 (u'Employment', u'Tax', 1L),
 (u'Environment', u'Business', 2L),
 (u'Environment', u'Animal', 2L),
 (u'Environment', u'Trees', 2L)]

Upvotes: 2

ffledgling
ffledgling

Reputation: 12150

What you seem to need here is groupby().

from itertools import groupby
import pprint

l = [(u'Agriculture', u'Farming', 3L), (u'Agriculture', u'Business', 2L),
        (u'Agriculture', u'Animal', 2L), (u'Agriculture', u'Illness', 1L),
        (u'Agriculture', u'Health', 1L), (u'Agriculture', u'Disability', 1L),
        (u'Agriculture', u'Carers', 1L), (u'Employment', u'Money', 1L),
        (u'Employment', u'Business', 1L), (u'Employment', u'Tax', 1L),
        (u'Employment', u'Debt', 1L), (u'Employment', u'Budget', 1L),
        (u'Environment', u'Business', 2L), (u'Environment', u'Animal', 2L),
        (u'Environment', u'Trees', 2L)]

pprint.pprint([sorted(x[1], key=(lambda x: -1*x[2]))[:3] 
               for x in groupby(l, lambda x: x[0])])

Which gives:

[[(u'Agriculture', u'Farming', 3L),
  (u'Agriculture', u'Business', 2L),
  (u'Agriculture', u'Animal', 2L)],
 [(u'Employment', u'Money', 1L),
  (u'Employment', u'Business', 1L),
  (u'Employment', u'Tax', 1L)],
 [(u'Environment', u'Business', 2L),
  (u'Environment', u'Animal', 2L),
  (u'Environment', u'Trees', 2L)]]

Upvotes: 0

Carles Mitjans
Carles Mitjans

Reputation: 4866

You can accomplish that using a list comprehension:

res = [y for y in a if y[2] in sorted([x[2] for x in a if x[0] == y[0]])[-3:]]

It assumes a is your list of tuples.

Output:

 [(u'Agriculture', u'Farming', 3L),
 (u'Agriculture', u'Business', 2L),
 (u'Agriculture', u'Animal', 2L),
 (u'Employment', u'Money', 1L),
 (u'Employment', u'Business', 1L),
 (u'Employment', u'Tax', 1L),
 (u'Employment', u'Debt', 1L),
 (u'Employment', u'Budget', 1L),
 (u'Environment', u'Business', 2L),
 (u'Environment', u'Animal', 2L),
 (u'Environment', u'Trees', 2L)]

Upvotes: 0

Related Questions