Checking which groups in pandas group by meet a certain condition

Question

Suppose I have a pandas dataframe such as:

id | name

1    foo

1    bar

2    foo

3    bar

My goal is to get the % of distinct ids that have both 'foo' and 'bar'

So far I've attempted df.groupby('id').name.nunique(), I don't want a count of unique names, I want to search for foo and bar specifically.

Can someone help me out?

Max Power · Accepted Answer

names_per_id = df.groupby('id')['name'].unique()

gets you a Series of lists, with:

id
1    [foo, bar]
2         [foo]
3         [bar]

from there we can get the ids which have both foo and bar:

foo_and_bar = names_per_id.apply(lambda row: ('foo' in row) and ('bar' in row))

1     True
2    False
3    False

finally, calculating the percentage you asked for:

pct = foo_and_bar.mean()

0.33333333333333331

Answers (2)