Reputation: 1638
Suppose the following contrived setup:
import pandas as pd
d = {'fname': ['bob', 'Bob', 'larry', 'LARRY', 'Larry', 'Dick'],
'lname': ['harris', 'Larson', 'Douglas', 'REDMOND', 'Beal', 'Dyke']}
df = pd.DataFrame(d)
g = df.groupby(df.fname.str.lower())
query = ['bob', 'dick', 'chris']
In plain english, I want to create a view of the overall Dataframe, for entries whose first name are in the query, ignoring case change.
I (think I) would like to do the equivalent of an efficient and idiomatic filter()
on g
to find and combine those groups that correspond to entries in query
, into a single DataFrame, viz:
fname lname
0 bob harris
1 Bob Larson
5 Dick Dyke
However, filter()
seems to iterate over the entire set of groups (important when df
is huge and query
is small), and anyway I can't seem to access the group-name from within filter()
anyway.
The best I could come up with:
pd.concat([pd.DataFrame()] + map(lambda y: g.get_group(y),
filter(lambda x: x in g.groups, query)))
But I suspect this is not efficient or idiomatic.
UPDATE:
I should have clarified that in the real world problem backing this, there is only one, very large df
, but there are several independent, small query
instances. isin
would probably work fine for just one query, but I've found considerable speed-up using the Groupby
once, followed by individual lookups per query into it as written with the map/filter combo above.
Upvotes: 1
Views: 51
Reputation: 323236
df[(df.fname.str.lower()).str.contains(r'|'.join(query),regex=True)]
Out[20]:
fname lname
0 bob harris
1 Bob Larson
5 Dick Dyke
Upvotes: 1
Reputation: 38415
I don't know if I am missing something here but simple boolean indexing using isin looks enough.
df[df.fname.str.lower().isin(query)]
fname lname
0 bob harris
1 Bob Larson
5 Dick Dyke
Upvotes: 1