Reputation: 743
My data has the structure of c1
and c2
and I want to generate c3
c1 c2 c3
x x True
NaN y NaN
x NaN NaN
y x False
My approach generates the wanted result but is extremely slow:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'c1': ['x', np.nan,'x','y'],
'c2': ['x', 'y',np.nan,'x'],
})
df['c3'] = df.apply(lambda row: row['c1'] == row['c2'] if type(row['c1']) is str and type(row['c2']) is str else np.nan, axis=1)
My approach is extremely slow as my dataset has 100k+ rows and this process is done for multiple column pairs
Is there a more efficient or elegant way of achieving the same result. I am using pandas 0.24.1
Upvotes: 3
Views: 1095
Reputation: 13401
Solution using np.select
cond2 = df['c1'] == df['c2']
cond1 = (df['c1'].isnull()) | (df['c2'].isnull())
df['c3'] = np.select([cond1, cond2], [None, True], False)
print(df)
Output:
c1 c2 c3
0 x x True
1 NaN y None
2 x NaN None
3 y x False
Upvotes: 3
Reputation: 2022
Try below:
df['c3'] = (df.c1==df.c2)
df.loc[df.isnull().any(1), 'c3'] = np.nan
Upvotes: 0
Reputation: 323226
You do not need apply
here using nunique
check one have one unique value , using isnull
+ any
mask
the NaN
row to NaN
(df.nunique(1)==1).astype(object).mask(df.isnull().any(1))
Out[61]:
0 True
1 NaN
2 NaN
3 False
dtype: object
Upvotes: 2