How to filter results of a groupby in pandas

Question

I am trying to filter out a result of a groupby.

I have this table:

A       B       C

A0      B0      0.5
A1      B0      0.2
A2      B1      0.6
A3      B1      0.4
A4      B2      1.0
A5      B2      1.2

A is the index and it is unique.

Secondly I have this list:

['A0', 'A1', 'A4']

I want to group by B and extract for each group the row with the highest value of C. This row MUST be chosen between all the rows in each group, giving highest priority to the rows with index present in the list above.

The result for this data and code has to be:

A       B       C

A0      B0      0.5
A2      B1      0.6
A4      B2      1.0

The pseudo code for this I think has to be:

group by B
for each group G:
    intersect group G rows index with indexes in the list
    if intersection is not void:
        the group G becomes the intersection
    sort the rows by C in ascending order
    take the first row as representative for this group

How can I do it in pandas?

Thanks

LondonRob · Accepted Answer

Here's a general solution. It's not pretty but it works:

def filtermax(g, filter_on, filter_items, max_over):
    infilter = g.index.isin(filter_items).sum() > 0
    if infilter:
        return g[g[max_over] == g.ix[filter_items][max_over].max()]
    else:
        return g[g[max_over] == g[max_over].max()]
    return g

which gives:

>>> x.groupby('B').apply(filtermax, 'A', ['A0', 'A1', 'A4'], 'C')
        B    C
B  A          
B0 A0  B0  0.5
B1 A2  B1  0.6
B2 A4  B2  1.0

If anyone can work out how to stop B being added as an index (at least on my system x.groupby('B', as_index=False doesn't help!) then this solution's pretty much perfect!

How to filter results of a groupby in pandas

Answers (2)

Related Questions