corezal
corezal

Reputation: 31

Python - convert values of a column based on multiple condition of other columns

here is my data:

df = pd.DataFrame()
df['x1'] = [1,2,3,4]
df['x2'] = [2,5,1,4]
df['x3'] = [1,1,1,1]
df['x4'] = [2,5,1,4]
df['x5'] = [1,2,2,4]
df['test'] = 'N'

and data will look like:

   x1  x2  x3  x4  x5  test
0   1   2   1   2   1   N
1   2   5   1   5   2   N
2   3   1   1   1   2   N
3   4   4   1   4   4   N

From this data, I'd like to convert 'test' column value to 'Y' if each row (all columns) matches with the below condition.

xs = ['x1','x2','x3','x4','x5']

if any of value in x1, x2, x3, x4, x5 in each row are matched with this condition:

3 <= df[xs] < 5 then, convert 'test' value to 'Y'.

So I tried:

df.loc[any(df[xs] >= 3 & df[xs] < 5), 'test'] ='Y'

but it occurs error (The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().).

My ideal output must be:

   x1  x2  x3  x4  x5  test
0   1   2   1   2   1   N
1   2   5   1   5   2   N
2   3   1   1   1   2   Y
3   4   4   1   4   4   Y

Could you help me out ? My real data includes hundreds of columns, so I would prefer using df[xs] for this, not separate condition like df['x1'] & df['x2'] & df['x3'] ...Thank you so much!

Upvotes: 1

Views: 128

Answers (3)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

Use np.where-

df['test'] = np.where(((df[xs] >= 3) & (df[xs] < 5)).any(axis=1), 'Y', 'N')

Output

    x1  x2  x3  x4  x5 test
0   1   2   1   2   1    N
1   2   5   1   5   2    N
2   3   1   1   1   2    Y
3   4   4   1   4   4    Y

Upvotes: 1

teepee
teepee

Reputation: 2714

Since you have so many columns, you probably want to avoid having to define the xs column vector as well. The best way to do this is to 'melt' the dataframe so that you have everything in a tall, tidy list. Then you can apply pandas vector operations. You also don't need to have the 'test' column there first. You can add it later as it is a function of the operation you are applying to the rest of the columns:

# you can define your df more easily like this:
df = pd.DataFrame({
    'x1': [1,2,3,4],
    'x2': [2,5,1,4],
    'x3': [1,1,1,1],
    'x4': [2,5,1,4],
    'x5': [1,2,2,4],
})

# melt and assess
df = df.reset_index()
df_melt = df.melt(id_vars='index')
df_melt['test_pass'] = (df_melt.value < 5) & (df_melt.value >=3)
test_results = df_melt.groupby('index')['test_pass'].any().map({True: 'Y', False: 'N'})

# then merge these test results back into your original dataframe:
df = df.merge(test_results, on='index')

Upvotes: 0

Oddaspa
Oddaspa

Reputation: 888

I have recreated your problem and will go through it step by step.

First you create a mask where your condition is true:

mask = (df[xs] >= 3) & (df[xs] < 5)
mask
    x1      x2      x3      x4      x5
0   False   False   False   False   False
1   False   False   False   False   False
2   True    False   False   False   False
3   True    True    False   True    True

Then you can transpose (to get row wise) the frame to get this, but we can work with only the mask:

>>> df[mask].T
        0   1   2   3
x1      NaN NaN 3   4
x2      NaN NaN NaN 4
x3      NaN NaN NaN NaN
x4      NaN NaN NaN 4
x5      NaN NaN NaN 4
test    NaN NaN NaN NaN

Then you can apply the any function.

df['test'] = mask.T.any()
    x1  x2  x3  x4  x5  test
0   1   2   1   2   1   False
1   2   5   1   5   2   False
2   3   1   1   1   2   True
3   4   4   1   4   4   True

Upvotes: 2

Related Questions