martinkabe
martinkabe

Reputation: 1139

Python pandas get rows from left table and from right table missing in left table

I have left and right table and I need to merge FileStamp values from both in this manner: take all values from left table and from right table missing in left table, join it by 'date':

import pandas as pd
left = pd.DataFrame({'FileStamp': ['T101', 'T102', 'T103', 'T104'], 'date': [20180101, 20180102, 20180103, 20180104]})
right = pd.DataFrame({'FileStamp': ['T501', 'T502'], 'date': [20180104, 20180105]})

Something like

result = pd.merge(left, right, how='outer', on='date')

but 'outer' is not good idea.

Desired output should look like

     FileStamp_x      date      FileStamp_y
0        T101       20180101         NaN
1        T102       20180102         NaN
2        T103       20180103         NaN
3        T104       20180104         NaN
4         NaN       20180105        T502

Is there any simple way how to achieve desired output?

Upvotes: 1

Views: 1121

Answers (2)

jezrael
jezrael

Reputation: 863146

Use filtering by isin before merge:

r = right[~right['date'].isin(left['date'])]
print (r)
  FileStamp      date
1      T502  20180105

result = pd.merge(left, r, how='outer', on='date')
print (result)
  FileStamp_x      date FileStamp_y
0        T101  20180101         NaN
1        T102  20180102         NaN
2        T103  20180103         NaN
3        T104  20180104         NaN
4         NaN  20180105        T502

Upvotes: 3

Shaido
Shaido

Reputation: 28367

You can adjust the values after the merge:

result = pd.merge(left, right, how='outer', on='date')
result['FileStamp_y'] = np.where(result['FileStamp_x'].isnull(), result['FileStamp_y'], np.nan)

Result:

    FileStamp_x     date  FileStamp_y
0          T101 20180101          NaN
1          T102 20180102          NaN
2          T103 20180103          NaN
3          T104 20180104          NaN
4           NaN 20180105         T502

Upvotes: 0

Related Questions