Michael Mathews Jr.
Michael Mathews Jr.

Reputation: 329

Using lambda functions in groupby.agg, pandas

I have a dataframe like this:

pd.DataFrame({
'animal': ['dog', 'dog', 'cat', 'dog', 'cat'],
'color': ['brown', 'black', 'white', 'black', 'black']})

I am trying to write a groupby function like this:

groupby('animal').agg(
proportion_of_black=('color', lambda x: 1 if x == 'black' else 0)).reset_index()

It returns the following error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Where is my code going wrong?

Upvotes: 1

Views: 5937

Answers (3)

Onyambu
Onyambu

Reputation: 79318

Since your question asks for proportion and not counts, you should do:

df.groupby(['animal']).agg(
   proportion=('color', lambda x: x.eq('black').mean())).reset_index()

    animal  proportion
0   cat     0.500000
1   dog     0.666667

Upvotes: 6

Quang Hoang
Quang Hoang

Reputation: 150785

Where is my code going wrong? When you do:

df.groupby('animal').agg(
proportion_of_black=('color', lambda x: 1 if x == 'black' else 0))

x is the series color for each animals, e.g. df.loc[df['animal']=='dog', 'color']. So x=='black' is a series of boolean. However if in Python only accept a single boolean. And Pandas doesn't know how to convert the series x==black to a single boolean to pass to if x=='black, and it complains as you see.

How to fix your code: apply should be avoided, even after groupby(). In your case, you can get the propotion of black with mean():

df['color'].eq('black').groupby(df['animal']).mean()

Output:

animal
cat    0.500000
dog    0.666667
Name: color, dtype: float64

Upvotes: 2

BENY
BENY

Reputation: 323326

Fix your code with any

df.groupby('animal').agg(
proportion_of_black=('color', lambda x: 1 if any(x == 'black') else 0)).reset_index()

If need the count of black

df.groupby('animal').agg(
proportion_of_black=('color', lambda x: sum(x == 'black') )).reset_index()
Out[124]: 
  animal  proportion_of_black
0    cat                    1
1    dog                    2

Update 2

pd.crosstab(df.animal,df.color,normalize='index') # ['black']
Out[128]: 
color      black     brown  white
animal                           
cat     0.500000  0.000000    0.5
dog     0.666667  0.333333    0.0

Upvotes: 2

Related Questions