Reputation: 171
I'm applying a simple function to a grouped pandas df. Below is what I'm trying. Even if I try to modify the function to carry one step, I keep getting the same error. Any direction will be super helpful.
def udf_pd(df_group):
if (df_group['A'] - df_group['B']) > 1:
df_group['D'] = 'Condition-1'
elif df_group.A == df_group.C:
df_group['D'] = 'Condition-2'
else:
df_group['D'] = 'Condition-3'
return df_group
final_df = df.groupby(['id1','id2']).apply(udf_pd)
final_df = final_df.reset_index()
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
Upvotes: 1
Views: 88
Reputation: 30991
Note that in groupby.apply the function is applied to the whole group. On the other hand, each if condition must boil down to a single value (not to any Series of True/False values).
So each comparison of 2 columns in this function must be supplemented with e.g. all() or any(), like in the example below:
def udf_pd(df_group):
if (df_group.A - df_group.B > 1).all():
df_group['D'] = 'Condition-1'
elif (df_group.A == df_group.C).all():
df_group['D'] = 'Condition-2'
else:
df_group['D'] = 'Condition-3'
return df_group
Of course, the function can return the whole group, e.g. "extended" by a new column and in such a case a single value of the new column is broadcast, so each row in the current group receives this value.
I created a test DataFrame:
id1 id2 A B C
0 1 1 5 3 0
1 1 1 7 5 4
2 1 2 3 4 3
3 1 2 4 5 4
4 2 1 2 4 3
5 2 1 4 5 4
In this example:
Hence the result of df.groupby(['id1','id2']).apply(udf_pd)
is:
id1 id2 A B C D
0 1 1 5 3 0 Condition-1
1 1 1 7 5 4 Condition-1
2 1 2 3 4 3 Condition-2
3 1 2 4 5 4 Condition-2
4 2 1 2 4 3 Condition-3
5 2 1 4 5 4 Condition-3
Upvotes: 2
Reputation: 322
I've encountered this error before and my understanding that pandas isn't sure which value it's supposed to run the conditional against. You're going to probably want to use .any()
or .all()
. Consider these examples
>>> a = pd.Series([0,0,3])
>>> b = pd.Series([1,1,1])
>>> a - b
0 -1
1 -1
2 2
dtype: int64
>>> (a - b) >= 1
0 False
1 False
2 True
dtype: bool
you can see that (a-b) >= 1 truthiness is kinda ambigious, the first elements in the vector is false while the others are true.
Using .any()
or .all()
will evaluate the entire series.
>>> ((a - b) >= 1).any()
True
>>> ((a - b) >= 1).all()
False
.any()
checks to see if well any of the elements in the series are True. While .all()
checks to see if all of the elements are True. Which in this example they're not.
you can also check out this post for more information: Pandas Boolean .any() .all()
Upvotes: 1