Reputation: 31
here is my data:
df = pd.DataFrame()
df['x1'] = [1,2,3,4]
df['x2'] = [2,5,1,4]
df['x3'] = [1,1,1,1]
df['x4'] = [2,5,1,4]
df['x5'] = [1,2,2,4]
df['test'] = 'N'
and data will look like:
x1 x2 x3 x4 x5 test
0 1 2 1 2 1 N
1 2 5 1 5 2 N
2 3 1 1 1 2 N
3 4 4 1 4 4 N
From this data, I'd like to convert 'test' column value to 'Y' if each row (all columns) matches with the below condition.
xs = ['x1','x2','x3','x4','x5']
if any of value in x1, x2, x3, x4, x5 in each row are matched with this condition:
3 <= df[xs] < 5 then
, convert 'test' value to 'Y'.
So I tried:
df.loc[any(df[xs] >= 3 & df[xs] < 5), 'test'] ='Y'
but it occurs error (The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().).
My ideal output must be:
x1 x2 x3 x4 x5 test
0 1 2 1 2 1 N
1 2 5 1 5 2 N
2 3 1 1 1 2 Y
3 4 4 1 4 4 Y
Could you help me out ? My real data includes hundreds of columns, so I would prefer using df[xs]
for this, not separate condition like df['x1'] & df['x2'] & df['x3']
...Thank you so much!
Upvotes: 1
Views: 128
Reputation: 9081
Use np.where-
df['test'] = np.where(((df[xs] >= 3) & (df[xs] < 5)).any(axis=1), 'Y', 'N')
Output
x1 x2 x3 x4 x5 test
0 1 2 1 2 1 N
1 2 5 1 5 2 N
2 3 1 1 1 2 Y
3 4 4 1 4 4 Y
Upvotes: 1
Reputation: 2714
Since you have so many columns, you probably want to avoid having to define the xs column vector as well. The best way to do this is to 'melt' the dataframe so that you have everything in a tall, tidy list. Then you can apply pandas vector operations. You also don't need to have the 'test' column there first. You can add it later as it is a function of the operation you are applying to the rest of the columns:
# you can define your df more easily like this:
df = pd.DataFrame({
'x1': [1,2,3,4],
'x2': [2,5,1,4],
'x3': [1,1,1,1],
'x4': [2,5,1,4],
'x5': [1,2,2,4],
})
# melt and assess
df = df.reset_index()
df_melt = df.melt(id_vars='index')
df_melt['test_pass'] = (df_melt.value < 5) & (df_melt.value >=3)
test_results = df_melt.groupby('index')['test_pass'].any().map({True: 'Y', False: 'N'})
# then merge these test results back into your original dataframe:
df = df.merge(test_results, on='index')
Upvotes: 0
Reputation: 888
I have recreated your problem and will go through it step by step.
First you create a mask where your condition is true:
mask = (df[xs] >= 3) & (df[xs] < 5)
mask
x1 x2 x3 x4 x5
0 False False False False False
1 False False False False False
2 True False False False False
3 True True False True True
Then you can transpose (to get row wise) the frame to get this, but we can work with only the mask:
>>> df[mask].T
0 1 2 3
x1 NaN NaN 3 4
x2 NaN NaN NaN 4
x3 NaN NaN NaN NaN
x4 NaN NaN NaN 4
x5 NaN NaN NaN 4
test NaN NaN NaN NaN
Then you can apply the any function.
df['test'] = mask.T.any()
x1 x2 x3 x4 x5 test
0 1 2 1 2 1 False
1 2 5 1 5 2 False
2 3 1 1 1 2 True
3 4 4 1 4 4 True
Upvotes: 2