Reputation:
I am familiar with performing joind in pandas
pd.merge(A,B,on='key',how='inner')
But how do I write a query in pandas for these 3 given joins as they require a IF NULL
Upvotes: 3
Views: 2137
Reputation: 6813
Although the answer by piRSquared is a very nice one, here is another way of doing it:
import pandas as pd
Create DataFrames A
and B
:
A = pd.DataFrame({'key': range(1, 6), 'A': ['a'] * 5})
B = pd.DataFrame({'key': range(3, 8), 'B': ['b'] * 5})
The solution for example A (i.e. left excluding join):
First perform a left join and then keep only columns of A
that don't have a corresponding row in B
:
pd.merge(A, B, on = 'key', how = 'left')[~A.key.isin(B.key)]
key A B
0 1 a NaN
1 2 a NaN
The solution for example B (i.e. right excluding join): Very similar to solution A, but with a right join:
pd.merge(A, B, on = 'key', how = 'right')[~B.key.isin(A.key)]
key A B
3 6 NaN b
4 7 NaN b
The solution for example C (i.e. outer excluding join): First perform an full outer join:
outer = pd.merge(A, B, on = 'key', how = 'outer')
And then filter rows in A
and B
that do not have a corresponding key in B
and A
:
outer[outer.key.isin(list(A.key[~A.key.isin(B.key)]) + list(B.key[~B.key.isin(A.key)]))]
key A B
0 1 a NaN
1 2 a NaN
5 6 NaN b
6 7 NaN b
Upvotes: 1
Reputation: 294218
Setup
A = pd.DataFrame(dict(key=range(0, 5), col1=list('abcde')))
B = pd.DataFrame(dict(key=range(2, 7), col2=list('vwxyz')))
print(A, B, sep='\n' * 2)
col1 key
0 a 0
1 b 1
2 c 2
3 d 3
4 e 4
col2 key
0 v 2
1 w 3
2 x 4
3 y 5
4 z 6
pd.DataFrame.merge
The most direct way to do this is to use the indicator
parameter.
A.merge(B, 'outer', indicator=True)
col1 key col2 _merge
0 a 0 NaN left_only
1 b 1 NaN left_only
2 c 2 v both
3 d 3 w both
4 e 4 x both
5 NaN 5 y right_only
6 NaN 6 z right_only
Then we can use pd.DataFrame.query
A - B
A.merge(B, 'outer', indicator=True).query('_merge == "left_only"')
col1 key col2 _merge
0 a 0 NaN left_only
1 b 1 NaN left_only
B - A
A.merge(B, 'outer', indicator=True).query('_merge == "right_only"')
col1 key col2 _merge
5 NaN 5 y right_only
6 NaN 6 z right_only
Symmetric Difference
A.merge(B, 'outer', indicator=True).query('_merge != "both"')
col1 key col2 _merge
0 a 0 NaN left_only
1 b 1 NaN left_only
5 NaN 5 y right_only
6 NaN 6 z right_only
pd.Series.isin
(mostly)However, I'd like to simply use pd.Series.isin
as a boolean mask.
A - B
A[~A.key.isin(B.key)]
col1 key
0 a 0
1 b 1
B - A
B[~B.key.isin(A.key)]
col2 key
3 y 5
4 z 6
Symmetric Difference
A[~A.key.isin(B.key)].append(B[~B.key.isin(A.key)])
Or
A.append(B).drop_duplicates('key', keep=False)
col1 col2 key
0 a NaN 0
1 b NaN 1
3 NaN y 5
4 NaN z 6
Upvotes: 5