Reputation: 21
I want to create a binary column which indicates 1 if the values of both columns in the following table are within the same range. For example, if the value on cat_1 is between 5-10 and the value in cat_2 is also between 5-10 then it should indicate 1, otherwise, it should be 0.
| cat_1. | cat_2. | [5-10] (new column to be created|
| -------- | -------------- | --------------------------------|
| 5 | 10 |1. |
| 7 | 9. |1 |
| 1 | 7. |0 |
So far, I have tried the following code but it return an error:
df.loc[((df['cat_1l'] >= 5 & df['cat_1'] <= 10)
& (df['cat_2'] >= 5 & result['cat_2'] <= 10)), '[5-10]' = 1
and here is the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 0
Views: 648
Reputation: 2484
In this case, you can also use apply()
to make a new column based on the other columns.
Here, I passed the value of the two columns, cat_1 and cat_2, to make a new column, as follows:
import pandas as pd
df = pd.DataFrame(
{
'cat_1': [5, 7, 1],
'cat_2': [10, 9, 7],
}
)
def check_in_range(x):
cat_1, cat_2 = x
start = 5
end = 10
if (start <= cat_1 <= end) and (start <= cat_2 <= end):
return 1
else:
return 0
df['new'] = df[['cat_1', 'cat_2']].apply(check_in_range, axis=1)
print(df)
# cat_1 cat_2 new
#0 5 10 1
#1 7 9 1
#2 1 7 0
Upvotes: -1
Reputation: 1352
The reason why you're getting an error is that evaluation of &
has priority over >=
. To fix your snippet, add parentheses around column comparisons:
df.loc[((df['cat_1l'] >= 5) & (df['cat_1'] <= 10)
& (df['cat_2'] >= 5) & (result['cat_2'] <= 10)), '[5-10]' = 1
Even better, it is preferred to define the new column as a whole, without subsetting using .loc
. Consider e.g.:
df['[5-10]'] = df['cat1'].between(5, 10) & df['cat_2'].between(5, 10)
Upvotes: 1
Reputation: 9363
pandas uses bitwise operation (& |) and each condition should be wrapped in a parenthesis, otherwise the error will be raise.
Try wrapping each condition with () like (df['cat_1l'] >= 5) & (...)
to see if error goes away.
However, your operation can be simplified with between
function.
df['[5-10]'] = (df.cat_1.between(5, 10) & df.cat_2.between(5, 10)).astype(int)
Upvotes: 3