rainy days.
rainy days.

Reputation: 31

Count Distinct Values Based on Certain Values on Certain Column

I have a pandas dataframe that looks like this:

name category status
John student yes
Jane employee no
Elijah student no
Anne student yes
Elle employee no

I want to count the number of each categories that have status 'yes'

I have tried 2 codes below:

  1. (DataFrame['status'].eq('yes').groupby(DataFrame['category']).nunique())
  2. (DataFrame['status'].eq('yes').groupby(DataFrame['category']).any().sum())

both codes give the same output:

category

student 2

employee 1

but, this is the output that I expect:

category

student 2

employee 0

can you help me fix this?

Upvotes: 1

Views: 54

Answers (1)

jezrael
jezrael

Reputation: 863801

If need count Trues values need aggregate sum, because Trues are processing like 1 and False like 0:

s = (DataFrame['status'].eq('yes').groupby(DataFrame['category']).sum())
print (s)
category
employee    0
student     2
Name: status, dtype: int64

If aggregate nunique get count of unique values in first True, False return 2 and in second No return 1 (no Yes for second group).

For testing check unique values per groups:

print ((DataFrame['status'].eq('yes').groupby(DataFrame['category']).unique()))
category
employee          [False]
student     [True, False]
Name: status, dtype: object

If use any it test if at least one True per groups, so ouput is different:

print ((DataFrame['status'].eq('yes').groupby(DataFrame['category']).any()))
category
employee    False
student      True
Name: status, dtype: bool

Upvotes: 1

Related Questions