Reputation: 11970
I am learning pandas and got stuck with this problem here.
I created a dataframe that tracks all users and the number of times they did something.
To better understand the problem I created this example:
import pandas as pd
data = [
{'username': 'me', 'bought_apples': 2, 'bought_pears': 0},
{'username': 'you', 'bought_apples': 1, 'bought_pears': 1}
]
df = pd.DataFrame(data)
df['bought_something'] = df['bought_apples'] > 0 or df['bought_pears'] > 0
In the last line I want to add a column that indicates if they user has bought something at all.
This error pops up:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I understand the point of ambiguity in panda's Series (also explained here) but I could not relate it to the problem.
Interestingly this works
df['bought_something'] = df['bought_apples'] > 0
Can anyone help me?
Upvotes: 11
Views: 34570
Reputation: 24742
The reason for that error is you use 'or' to 'join' two boolean vectors instead of boolean scalar. That's why it says it is ambiguous.
Upvotes: 2
Reputation: 394041
You can call sum
row-wise and compare if this is greater than 0
:
In [105]:
df['bought_something'] = df[['bought_apples','bought_pears']].sum(axis=1) > 0
df
Out[105]:
bought_apples bought_pears username bought_something
0 2 0 me True
1 1 1 you True
Regarding your original attempt, the error message is telling you that it's ambiguous to compare a scalar with an array, if you want to or
boolean conditions then you need to use the bit-wise operator |
and wrap the conditions in parentheses due to operator precedence:
In [111]:
df['bought_something'] = ((df['bought_apples'] > 0) | (df['bought_pears'] > 0))
df
Out[111]:
bought_apples bought_pears username bought_something
0 2 0 me True
1 1 1 you True
Upvotes: 20