user9617403
user9617403

Reputation: 59

specific values count in groupby

I have the following dataframe:

df = pd.DataFrame([
    ('A', 'A', 'Cat'),
    ('A', 'A', 'Dog'),
    ('A', 'A', 'Cat'),
    ('A', 'B', 'Dog'),
    ('B', 'B', 'Rat'),
    ('B', 'C', 'Cat'),
    ('B', 'C', 'Cat')
], columns=['id', 'group', 'Animal'])

I want to group it by id and group and calculate the occurrence of Cat in each group. An example output will be:

[2, 0, 0, 2]

2 cat in group AA,

0 cat in group AB,

0 cat in group BB,

2 cat in group BC

Can anyone help? Thanks!

Upvotes: 4

Views: 21466

Answers (4)

Reslan Tinawi
Reslan Tinawi

Reputation: 557

Your question is very similar (if not duplicate) to this question.

When you want to perform some aggregation which are not part of pandas (e.g. sum, max, min) then you should use the apply method.

Please note that apply can be slower, so don't start using everywhere.

Now, for your question, you start with grouping by:

df.groupby(by=["id", "group"])

Then, you select the Animal column from the groups:

df.groupby(by=["id", "group"])['Animal']

At this point, you have values from the Animal column for each different group, ready for any aggregation.

Let's define a function to count Cats in a pandas Series:

def count_cat(series):
    return (series == 'Cat').sum()

All you have to do now, is to pass this function to the apply method, like the following:

df.groupby(by=["id", "group"])['Animal'].apply(count_cat).reset_index(name='count_cat')

And the result would be:

id group count_cat
0 A A 2
1 A B 0
2 B B 0
3 B C 2

Upvotes: 5

David Saghinadze
David Saghinadze

Reputation: 1

if 0 values are not important than you can try:

list(df[df['Animal']=='Cat'].groupby(['id', 'group']).count()['Animal'])

Sorry for my English

Upvotes: 0

cs95
cs95

Reputation: 403218

You'll need mask + groupby:

df['Animal'] = df['Animal'].mask(df['Animal'].ne('Cat'))
df.groupby(['id', 'group'])['Animal'].count().tolist()

[2, 0, 0, 2]

Upvotes: 9

BENY
BENY

Reputation: 323396

You can using value_counts

df.groupby(['id','group']).Animal.value_counts().unstack(fill_value=0).loc[:,'Cat'].tolist()
Out[144]: [2, 0, 0, 2]

More info return before the last step .loc

df.groupby(['id','group']).Animal.value_counts().unstack(fill_value=0)
Out[145]: 
Animal    Cat  Dog  Rat
id group               
A  A        2    1    0
   B        0    1    0
B  B        0    0    1
   C        2    0    0

Upvotes: 7

Related Questions