jdowdell
jdowdell

Reputation: 1638

How to filter on the Groupby Criterion in Pandas?

Suppose the following contrived setup:

import pandas as pd
d = {'fname': ['bob', 'Bob', 'larry', 'LARRY', 'Larry', 'Dick'], 
     'lname': ['harris', 'Larson', 'Douglas', 'REDMOND', 'Beal', 'Dyke']}
df = pd.DataFrame(d)
g = df.groupby(df.fname.str.lower())

query = ['bob', 'dick', 'chris']

In plain english, I want to create a view of the overall Dataframe, for entries whose first name are in the query, ignoring case change. I (think I) would like to do the equivalent of an efficient and idiomatic filter() on g to find and combine those groups that correspond to entries in query, into a single DataFrame, viz:

   fname    lname
0    bob   harris
1    Bob   Larson
5   Dick     Dyke

However, filter() seems to iterate over the entire set of groups (important when df is huge and query is small), and anyway I can't seem to access the group-name from within filter() anyway.

The best I could come up with:

pd.concat([pd.DataFrame()] + map(lambda y: g.get_group(y), 
                                 filter(lambda x: x in g.groups, query)))

But I suspect this is not efficient or idiomatic.

UPDATE:

I should have clarified that in the real world problem backing this, there is only one, very large df, but there are several independent, small query instances. isin would probably work fine for just one query, but I've found considerable speed-up using the Groupby once, followed by individual lookups per query into it as written with the map/filter combo above.

Upvotes: 1

Views: 51

Answers (2)

BENY
BENY

Reputation: 323236

df[(df.fname.str.lower()).str.contains(r'|'.join(query),regex=True)]
Out[20]: 
  fname   lname
0   bob  harris
1   Bob  Larson
5  Dick    Dyke

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

I don't know if I am missing something here but simple boolean indexing using isin looks enough.

df[df.fname.str.lower().isin(query)]

    fname   lname
0   bob     harris
1   Bob     Larson
5   Dick    Dyke

Upvotes: 1

Related Questions