Reputation: 673
I am trying to color points of a pandas dataframe depending on TWO conditions. Example:
IF value of col1 > a AND value of col2 - value of col3 < b THEN value of col4 = string
ELSE value of col4 = other string.
I have tried so many different ways now and everything I found online was only depending on one condition.
My example code always raises the Error:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Here's the code. Tried several variations without success.
df = pd.DataFrame()
df['A'] = range(10)
df['B'] = range(11,21,1)
df['C'] = range(20,10,-1)
borderE = 3.
ex = 0.
#print df
df['color'] = np.where(all([df.A < borderE, df.B - df.C < ex]), 'r', 'b')
Btw: I understand, what it says but not how to handle it.
Upvotes: 24
Views: 83153
Reputation: 23131
The error in the question occurred because OP used all()
function instead of the bitwise-&
operator to chain multiple comparisons together.1
Another way to chain multiple comparisons is to evaluate an expression via eval()
method. The following checks if A value is less than borderE
AND (B - C)
value is less than ex
for each row.
df.eval('A < @borderE and B - C < @ex')
Using its outcome as a numpy.where()
condition, you can assign values by
df['color'] = np.where(df.eval('A < @borderE and B - C < @ex'), 'r', 'b')
If you installed numexpr (pip install numexpr
) as recommended in the pandas documentation, this method should perform as well (and better if you have a lot of conditions to reduce) as chaining via &
. The advantage is that (i) it's much more readable (imo) and (ii) you don't need to worry about brackets ()
, and
/&
etc. anymore because the order of precedence inside the string expression is the same as that in Python.
1: all()
by design doesn't work on pandas Series (or numpy arrays) because the pandas developers felt that it's ambiguous when to return True: if any element is True or if all elements are True; in fact, it's not clear if a single bool should be returned or a boolean Series should be returned when you call all()
. You can explicitly tell it that you want a single bool returned by calling items(), all() etc. on the Series object (which the error is saying) but if you do so (e.g. all([(df.A < borderE).all(), (df.B - df.C < ex).all()])
), the conditions will be aggregated into a single boolean value, not the desired boolean vector.
Upvotes: 0
Reputation: 21
for me @Alexander 's solution didn't work in my specific case, I had to use lists to pass the two conditions, and then transpose the output, My solution also work for this case:
df['color'] = np.where([df.A < borderE] and [df.B - df.C < ex], 'r', 'b').transpose()
Yields:
A B C color
0 0 11 20 r
1 1 12 19 r
2 2 13 18 r
3 3 14 17 b
4 4 15 16 b
5 5 16 15 b
6 6 17 14 b
7 7 18 13 b
8 8 19 12 b
9 9 20 11 b
Upvotes: 0
Reputation: 109546
Selection criteria uses Boolean indexing:
df['color'] = np.where(((df.A < borderE) & ((df.B - df.C) < ex)), 'r', 'b')
>>> df
A B C color
0 0 11 20 r
1 1 12 19 r
2 2 13 18 r
3 3 14 17 b
4 4 15 16 b
5 5 16 15 b
6 6 17 14 b
7 7 18 13 b
8 8 19 12 b
9 9 20 11 b
Upvotes: 34
Reputation: 4090
wrap the IF in a function and apply it:
def color(row):
borderE = 3.
ex = 0.
if (row.A > borderE) and( row.B - row.C < ex) :
return "somestring"
else:
return "otherstring"
df.loc[:, 'color'] = df.apply(color, axis = 1)
Yields:
A B C color
0 0 11 20 otherstring
1 1 12 19 otherstring
2 2 13 18 otherstring
3 3 14 17 otherstring
4 4 15 16 somestring
5 5 16 15 otherstring
6 6 17 14 otherstring
7 7 18 13 otherstring
8 8 19 12 otherstring
9 9 20 11 otherstring
Upvotes: 14