Reputation: 2613
I have a pandas dataframe with two columns as following:
A B
Yes No
Yes Yes
No Yes
No No
NA Yes
NA NA
I want to create a new column based on these values such that if any of the column values are Yes
, the value in the new column should also be Yes
. If both columns have the value No
, the new column would also have the value No
. And finally, if both columns has value NA
, the output would also have NA
for the new column. Example output for above data is:
C
Yes
Yes
Yes
No
Yes
NA
I wrote a loop over the length of dataframe and then checks for each value to get a new column. However, it takes a long time for 10M records. Is there a faster pythonic way to achieve this?
Upvotes: 4
Views: 562
Reputation: 26676
Another way of doing it. Hard corded though
conditions=((df['A']=='Yes')|(df['B']=='Yes'),(df['A']=='No')&(df['B']=='No'),(df['A']=='NaN')&(df['B']=='NaN'))
choicelist=('Yes','No','NaN')
df['C']=np.select(conditions, choicelist)
df
Upvotes: 0
Reputation: 323226
Something like
df.fillna('').max(axis=1)
Out[106]:
0 Yes
1 Yes
2 Yes
3 No
4 Yes
5
dtype: object
Upvotes: 7
Reputation: 153460
Try:
(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())
Upvotes: 2