Ach113
Ach113

Reputation: 1825

Pandas left join - how to replace values not present in second df with NaN

I have two dataframes which I am joining like so:

df3 = df1.join(df2.set_index('id'), on='id', how='left')

But I want to replace values for id-s which are present in df1 but not in df2 with NaN (left join will just leave the values in df1 as they are). Whats the easiest way to accomplish this?

Upvotes: 1

Views: 1381

Answers (1)

jezrael
jezrael

Reputation: 863226

I think you need Series.where with Series.isin:

df1['id'] = df1['id'].where(df1['id'].isin(df2['id']))

Or numpy.where:

df1['id'] = np.where(df1['id'].isin(df2['id']), df1['id'], np.nan)

Sample:

df1 = pd.DataFrame({
        'id':list('abc'),
})

df2 = pd.DataFrame({
        'id':list('dmna'),
})

df1['id'] = df1['id'].where(df1['id'].isin(df2['id']))
print (df1)
    id
0    a
1  NaN
2  NaN

Or solution with merge and indicator parameter:

df3 = df1.merge(df2, on='id', how='left', indicator=True)
df3['id'] = df3['id'].mask(df3.pop('_merge').eq('left_only'))
print (df3)
    id
0    a
1  NaN
2  NaN

Upvotes: 1

Related Questions