Ryan
Ryan

Reputation: 21

How do I create a new dataframe column based on two other columns?

I want to create a binary column which indicates 1 if the values of both columns in the following table are within the same range. For example, if the value on cat_1 is between 5-10 and the value in cat_2 is also between 5-10 then it should indicate 1, otherwise, it should be 0.

| cat_1.   | cat_2.         | [5-10] (new column to be created|
| -------- | -------------- | --------------------------------|
| 5        | 10             |1.                               |
| 7        | 9.             |1                                |
| 1        | 7.             |0                                |

So far, I have tried the following code but it return an error:

df.loc[((df['cat_1l'] >= 5 & df['cat_1'] <= 10) 
       & (df['cat_2'] >= 5 & result['cat_2'] <= 10)), '[5-10]' = 1

and here is the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 0

Views: 648

Answers (3)

Park
Park

Reputation: 2484

In this case, you can also use apply() to make a new column based on the other columns.

Here, I passed the value of the two columns, cat_1 and cat_2, to make a new column, as follows:

import pandas as pd

df = pd.DataFrame(
    {
        'cat_1': [5, 7, 1],
        'cat_2': [10, 9, 7],
    }
)


def check_in_range(x):
    cat_1, cat_2 = x
    start = 5
    end = 10
    if (start <= cat_1 <= end) and (start <= cat_2 <= end):
        return 1
    else:
        return 0

df['new'] = df[['cat_1', 'cat_2']].apply(check_in_range, axis=1)

print(df)
#   cat_1  cat_2  new
#0      5     10    1
#1      7      9    1
#2      1      7    0

Upvotes: -1

ozacha
ozacha

Reputation: 1352

The reason why you're getting an error is that evaluation of & has priority over >=. To fix your snippet, add parentheses around column comparisons:

df.loc[((df['cat_1l'] >= 5) & (df['cat_1'] <= 10) 
       & (df['cat_2'] >= 5) & (result['cat_2'] <= 10)), '[5-10]' = 1

Even better, it is preferred to define the new column as a whole, without subsetting using .loc. Consider e.g.:

df['[5-10]'] = df['cat1'].between(5, 10) & df['cat_2'].between(5, 10)

Upvotes: 1

Emma
Emma

Reputation: 9363

pandas uses bitwise operation (& |) and each condition should be wrapped in a parenthesis, otherwise the error will be raise.

Try wrapping each condition with () like (df['cat_1l'] >= 5) & (...) to see if error goes away.

However, your operation can be simplified with between function.

df['[5-10]'] = (df.cat_1.between(5, 10) & df.cat_2.between(5, 10)).astype(int)

Upvotes: 3

Related Questions