user12691361
user12691361

Reputation:

how to populate a new column from conditions based on two existing columns , in Pandas?

I am trying to create a new column based on conditions from two existing columns, but getting an error after using "np.where", is there any other way to achieve this ?

Input:

change1 change2
yes     yes
yes     no
no      yes
no      yes

Expected Output:

change1 change2 change3
yes      yes      ok
yes      no       not ok
no       yes      not ok
no       yes      not ok

Code:

import pandas as pd
import numpy as np



df1=pd.read_csv('test2.txt',sep='\t')
df1['change1'] = df1['change1'].astype(str)
df1['change2'] = df1['change2'].astype(str)


df['change3'] = np.where(df1['change1']=='yes' & df1['change2'] == 'yes', 'ok', 'not ok')

print(df1)

Error:

cannot compare a dtyped [object] array with a scalar of type [bool]

Upvotes: 2

Views: 193

Answers (3)

kederrac
kederrac

Reputation: 17322

you can use:

df['change3'] = df.apply(lambda x: 'ok' if x['change1'] == x['change2'] else 'not ok', axis=1)

output:

enter image description here

Upvotes: 2

ansev
ansev

Reputation: 30920

Use DataFrame.eq and DataFrame.all. This will help you improve the syntax of code and avoid errors.

df['change3'] = np.where(df.eq('yes').all(axis=1), 'ok' , 'not ok')
#if you need select columns
#df['change3'] = np.where(df[['change1', 'change2']].eq('yes').all(axis=1),
                          'ok' , 'not ok')

without DataFrame.all

df['change3'] = np.where((df1['change1']=='yes') & (df1['change2'] == 'yes'), 
                         'ok', 'not ok')

or

df['change3'] = np.where(df1['change1'].eq('yes') & df1['change2'].eq('yes'), 
                         'ok', 'not ok')

You can also use Series.map / Series.replace

 df['change3'] = df.eq('yes').all(axis=1).map({True : 'ok' , False : 'not ok'})
#df['change3'] = df.eq('yes').all(axis=1).replace({True : 'ok' , False : 'not ok'})

print(df)

#   change1 change2 change3
# 0     yes     yes      ok
# 1     yes      no  not ok
# 2      no     yes  not ok
# 3      no     yes  not ok

Upvotes: 4

Erfan
Erfan

Reputation: 42886

Using DataFrame.replace to convert to binary, then checking all per row:

df1['change3'] = np.where(df1.replace({'yes': 1, 'no': 0}).all(axis=1), 
                          'ok', 
                          'not ok')

Or with replace and sum:

df1['change3'] = np.where(df1.replace({'yes': 1, 'no': 0}).sum(axis=1).gt(1), 
                          'ok', 
                          'not ok')
  change1 change2 change3
0     yes     yes      ok
1     yes      no  not ok
2      no     yes  not ok
3      no     yes  not ok

Upvotes: 3

Related Questions