Reputation: 1638

How to filter on the Groupby Criterion in Pandas?

Suppose the following contrived setup:

import pandas as pd
d = {'fname': ['bob', 'Bob', 'larry', 'LARRY', 'Larry', 'Dick'], 
     'lname': ['harris', 'Larson', 'Douglas', 'REDMOND', 'Beal', 'Dyke']}
df = pd.DataFrame(d)
g = df.groupby(df.fname.str.lower())

query = ['bob', 'dick', 'chris']

In plain english, I want to create a view of the overall Dataframe, for entries whose first name are in the query, ignoring case change. I (think I) would like to do the equivalent of an efficient and idiomatic filter() on g to find and combine those groups that correspond to entries in query, into a single DataFrame, viz:

   fname    lname
0    bob   harris
1    Bob   Larson
5   Dick     Dyke

However, filter() seems to iterate over the entire set of groups (important when df is huge and query is small), and anyway I can't seem to access the group-name from within filter() anyway.

The best I could come up with:

pd.concat([pd.DataFrame()] + map(lambda y: g.get_group(y), 
                                 filter(lambda x: x in g.groups, query)))

But I suspect this is not efficient or idiomatic.

UPDATE:

I should have clarified that in the real world problem backing this, there is only one, very large df, but there are several independent, small query instances. isin would probably work fine for just one query, but I've found considerable speed-up using the Groupby once, followed by individual lookups per query into it as written with the map/filter combo above.

Upvotes: 1

How to filter on the Groupby Criterion in Pandas?

Answers (2)

Related Questions