Reputation: 109
I have 2 dataframes like these:
df1 = pd.DataFrame(data = {'col1' : ['finance', 'accounting'], 'col2' : ['f1', 'a1']})
df2 = pd.DataFrame(data = {'col1' : ['finance', 'finance', 'finance', 'accounting', 'accounting','IT','IT'], 'col2' : ['f1','f2','f3','a1,'a2','I1','I2']})
df1
col1 col2
0 finance f1
1 accounting a1
df2
col1 col2
0 finance f1
1 finance f2
2 finance f3
3 accounting a1
4 accounting a2
5 IT I1
6 IT I2
I would like to do LEFT JOIN on col1 and ANTI-JOIN on col2. The output should look like this:
col1 col2
finance f2
finance f3
accounting a2
Could someone please help me how to do it properly in pandas. I tried both join
and merge
in pandas but it hasn't worked for me. Thanks in advance.
Upvotes: 0
Views: 219
Reputation: 9379
Just for fun, here's another way (other than the really elegant solution by @mozway):
df2 = ( df2
.reset_index() # save index as column 'index'
.set_index('col1') # make 'col1' the index
.loc[df1.col1,:] # filter for 'col1' values in df1
.set_index('col2', append=True) # add 'col2' to the index
.drop(index=df1.
set_index(list(df1.columns))
.index) # create a multi-index from df1 and drop all matches from df2
.reset_index() # make 'col1' and col2' columns again
.set_index('index') # make 'index' the index again
.rename_axis(index=None) ) # make the index anonymous
Output:
col1 col2
1 finance f2
2 finance f3
4 accounting a2
Upvotes: 0
Reputation: 260890
You can merge
and filter:
(df1.merge(df2, on='col1', suffixes=('_', None))
.loc[lambda d: d['col2'] != d.pop('col2_')]
)
Output:
col1 col2
1 finance f2
2 finance f3
4 accounting a2
Upvotes: 2