Reputation: 743

Comparing two columns and keeping NaNs

My data has the structure of c1 and c2 and I want to generate c3

 c1  c2    c3
  x   x  True
NaN   y   NaN
  x NaN   NaN
  y   x False

My approach generates the wanted result but is extremely slow:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'c1': ['x', np.nan,'x','y'],
    'c2': ['x', 'y',np.nan,'x'],
})

df['c3'] = df.apply(lambda row: row['c1'] == row['c2'] if type(row['c1']) is str and type(row['c2']) is str else np.nan, axis=1)

My approach is extremely slow as my dataset has 100k+ rows and this process is done for multiple column pairs

Is there a more efficient or elegant way of achieving the same result. I am using pandas 0.24.1

Upvotes: 3

Answers (3)

Sociopath

Reputation: 13426

Solution using np.select

cond2 = df['c1'] == df['c2']
cond1 = (df['c1'].isnull()) | (df['c2'].isnull())

df['c3'] = np.select([cond1, cond2], [None, True], False)

print(df)

Output:

  c1   c2     c3                                                                                                                    
0    x    x   True                                                                                                                    
1  NaN    y   None                                                                                                                    
2    x  NaN   None                                                                                                                    
3    y    x  False

Upvotes: 3

Rajat Jain

Reputation: 2032

Try below:

df['c3'] = (df.c1==df.c2)

df.loc[df.isnull().any(1), 'c3'] = np.nan

Upvotes: 0

BENY

Reputation: 323376

You do not need apply here using nunique check one have one unique value , using isnull + any mask the NaN row to NaN

(df.nunique(1)==1).astype(object).mask(df.isnull().any(1))
Out[61]: 
0     True
1      NaN
2      NaN
3    False
dtype: object

Upvotes: 2

Comparing two columns and keeping NaNs

Answers (3)

Related Questions