Reputation: 1825
I have two dataframes which I am joining like so:
df3 = df1.join(df2.set_index('id'), on='id', how='left')
But I want to replace values for id-s which are present in df1 but not in df2 with NaN
(left join will just leave the values in df1 as they are). Whats the easiest way to accomplish this?
Upvotes: 1
Views: 1381
Reputation: 863226
I think you need Series.where
with Series.isin
:
df1['id'] = df1['id'].where(df1['id'].isin(df2['id']))
Or numpy.where
:
df1['id'] = np.where(df1['id'].isin(df2['id']), df1['id'], np.nan)
Sample:
df1 = pd.DataFrame({
'id':list('abc'),
})
df2 = pd.DataFrame({
'id':list('dmna'),
})
df1['id'] = df1['id'].where(df1['id'].isin(df2['id']))
print (df1)
id
0 a
1 NaN
2 NaN
Or solution with merge
and indicator
parameter:
df3 = df1.merge(df2, on='id', how='left', indicator=True)
df3['id'] = df3['id'].mask(df3.pop('_merge').eq('left_only'))
print (df3)
id
0 a
1 NaN
2 NaN
Upvotes: 1