Haroon S.
Haroon S.

Reputation: 2613

Pandas: Creating new column based on values from existing column

I have a pandas dataframe with two columns as following:

A      B
Yes    No
Yes    Yes
No     Yes
No     No
NA     Yes
NA     NA

I want to create a new column based on these values such that if any of the column values are Yes, the value in the new column should also be Yes. If both columns have the value No, the new column would also have the value No. And finally, if both columns has value NA, the output would also have NA for the new column. Example output for above data is:

C
Yes
Yes
Yes
No
Yes
NA

I wrote a loop over the length of dataframe and then checks for each value to get a new column. However, it takes a long time for 10M records. Is there a faster pythonic way to achieve this?

Upvotes: 4

Views: 562

Answers (3)

wwnde
wwnde

Reputation: 26676

Another way of doing it. Hard corded though

conditions=((df['A']=='Yes')|(df['B']=='Yes'),(df['A']=='No')&(df['B']=='No'),(df['A']=='NaN')&(df['B']=='NaN'))
choicelist=('Yes','No','NaN')
df['C']=np.select(conditions, choicelist)
df

enter image description here

Upvotes: 0

BENY
BENY

Reputation: 323226

Something like

df.fillna('').max(axis=1)
Out[106]: 
0    Yes
1    Yes
2    Yes
3     No
4    Yes
5       
dtype: object

Upvotes: 7

Scott Boston
Scott Boston

Reputation: 153460

Try:

(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())

Upvotes: 2

Related Questions