jfalkson
jfalkson

Reputation: 3739

Checking which groups in pandas group by meet a certain condition

Suppose I have a pandas dataframe such as:

id | name

1    foo

1    bar

2    foo

3    bar

My goal is to get the % of distinct ids that have both 'foo' and 'bar'

So far I've attempted df.groupby('id').name.nunique(), I don't want a count of unique names, I want to search for foo and bar specifically.

Can someone help me out?

Upvotes: 0

Views: 269

Answers (2)

Max Power
Max Power

Reputation: 8996

names_per_id = df.groupby('id')['name'].unique()

gets you a Series of lists, with:

id
1    [foo, bar]
2         [foo]
3         [bar]

from there we can get the ids which have both foo and bar:

foo_and_bar = names_per_id.apply(lambda row: ('foo' in row) and ('bar' in row))

1     True
2    False
3    False

finally, calculating the percentage you asked for:

pct = foo_and_bar.mean()

0.33333333333333331

Upvotes: 2

Ted Petrou
Ted Petrou

Reputation: 62007

Here is a general solution. Just replace the ['foo', 'bar'] list with a custom list of your choice.

res = df.groupby('id')['name'].agg(lambda s: np.all(np.in1d(['foo', 'bar'], s)))

Which outputs this:

     name
id       
1    True
2   False
3   False

Then just take the mean of this Series

res.mean()

Output

name    0.333333
dtype: float64

To keep it all in pandas do:

df.groupby('id')['name'].agg(lambda s: pd.Series(['foo', 'bar']).isin(s).all()).mean()

Upvotes: 1

Related Questions