Reputation: 329
I have a dataframe like this:
pd.DataFrame({
'animal': ['dog', 'dog', 'cat', 'dog', 'cat'],
'color': ['brown', 'black', 'white', 'black', 'black']})
I am trying to write a groupby function like this:
groupby('animal').agg(
proportion_of_black=('color', lambda x: 1 if x == 'black' else 0)).reset_index()
It returns the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Where is my code going wrong?
Upvotes: 1
Views: 5937
Reputation: 79318
Since your question asks for proportion and not counts, you should do:
df.groupby(['animal']).agg(
proportion=('color', lambda x: x.eq('black').mean())).reset_index()
animal proportion
0 cat 0.500000
1 dog 0.666667
Upvotes: 6
Reputation: 150785
Where is my code going wrong? When you do:
df.groupby('animal').agg(
proportion_of_black=('color', lambda x: 1 if x == 'black' else 0))
x
is the series color
for each animals, e.g. df.loc[df['animal']=='dog', 'color']
. So x=='black'
is a series of boolean. However if
in Python only accept a single boolean. And Pandas doesn't know how to convert the series x==black
to a single boolean to pass to if x=='black
, and it complains as you see.
How to fix your code: apply
should be avoided, even after groupby()
. In your case, you can get the propotion of black with mean()
:
df['color'].eq('black').groupby(df['animal']).mean()
Output:
animal
cat 0.500000
dog 0.666667
Name: color, dtype: float64
Upvotes: 2
Reputation: 323326
Fix your code with any
df.groupby('animal').agg(
proportion_of_black=('color', lambda x: 1 if any(x == 'black') else 0)).reset_index()
If need the count of black
df.groupby('animal').agg(
proportion_of_black=('color', lambda x: sum(x == 'black') )).reset_index()
Out[124]:
animal proportion_of_black
0 cat 1
1 dog 2
Update 2
pd.crosstab(df.animal,df.color,normalize='index') # ['black']
Out[128]:
color black brown white
animal
cat 0.500000 0.000000 0.5
dog 0.666667 0.333333 0.0
Upvotes: 2