Reputation: 435
I have seem LEFT JOIN vs. LEFT OUTER JOIN in SQL Server and https://chrisalbon.com/python/data_wrangling/pandas_join_merge_dataframe/ but haven't found what I'm looking for. I have two python dataframes :
A = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]),
columns=['a', 'b', 'c'])
a b c
0 1 2 3
1 4 5 6
2 1 2 3
3 4 5 6
B = pd.DataFrame(np.array([[7, 8, 9], [7, 8, 9], [3, 2, 1], [3, 2, 1]]),
columns=['c', 'b', 'a'])
c b a
0 7 8 9
1 7 8 9
2 3 2 1
3 3 2 1
where the values [1, 2, 3] are repeated in both but [4, 5, 6] and [9, 8, 7] is not.
What I want it to have all the values from one dataframe that doesn't join the other dataframe. For example something like :
A some_left_outer_join B = C
C = pd.DataFrame(np.array([ [4, 5, 6], [4, 5, 6]]),
columns=['a', 'b', 'c'])
and to have all the values from both dataframes that doesn't join the other dataframe. For example something like :
A some_outer_join B = D
D = pd.DataFrame(np.array([ [4, 5, 6], [4, 5, 6] , [9, 8, 7] , [9, 8, 7]]),
columns=['a', 'b', 'c'])
Trying
(pd.merge(left=A, right=B, how='left', on=['a', 'b', 'c']))
a b c
0 1 2 3
1 1 2 3
2 4 5 6
3 1 2 3
4 1 2 3
5 4 5 6
give me the joined and not joined elements. I only want the not joined elements. Please, how can I get the desiere elements?
Upvotes: 3
Views: 2219
Reputation: 1000
One trick is to add a dummy columns to A and B. So in the non-matched rows, you will get NaN's in the dummy columns. You can drop the dummy column later
B['d'] = 0
A['e'] = 0
AB = pd.merge(left=A, right=B, how='outer', on=['a', 'b', 'c'])
C = AB[AB.d.apply(pd.isnull)]
D = AB[(AB.d.apply(pd.isnull)) | (AB.e.apply(pd.isnull))]
C
a b c e d
4 4 5 6 0.0 NaN
5 4 5 6 0.0 NaN
D
a b c e d
4 4 5 6 0.0 NaN
5 4 5 6 0.0 NaN
6 9 8 7 NaN 0.0
7 9 8 7 NaN 0.0
Upvotes: 0
Reputation: 862681
You can use parameter indicator=True
with outer join and then filter by boolean indexing
with Series.eq
for ==
and Series.ne
for !=
:
df = (pd.merge(left=A, right=B, how='outer', on=['a', 'b', 'c'], indicator=True))
print (df)
a b c _merge
0 1 2 3 both
1 1 2 3 both
2 1 2 3 both
3 1 2 3 both
4 4 5 6 left_only
5 4 5 6 left_only
6 9 8 7 right_only
7 9 8 7 right_only
C = df[df['_merge'].eq('left_only')]
print (C)
a b c _merge
4 4 5 6 left_only
5 4 5 6 left_only
D = df[df['_merge'].ne('both')]
print (D)
a b c _merge
4 4 5 6 left_only
5 4 5 6 left_only
6 9 8 7 right_only
7 9 8 7 right_only
If want also remove column:
s = df.pop('_merge')
C = df[s.eq('left_only')]
print (C)
a b c
4 4 5 6
5 4 5 6
D = df[s.ne('both')]
print (D)
a b c
4 4 5 6
5 4 5 6
6 9 8 7
7 9 8 7
Upvotes: 1