Israel Rodriguez
Israel Rodriguez

Reputation: 435

Outer Join on dataframes python

I have seem LEFT JOIN vs. LEFT OUTER JOIN in SQL Server and https://chrisalbon.com/python/data_wrangling/pandas_join_merge_dataframe/ but haven't found what I'm looking for. I have two python dataframes :

A = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]),
                   columns=['a', 'b', 'c'])

    a   b   c
0   1   2   3
1   4   5   6
2   1   2   3
3   4   5   6

B = pd.DataFrame(np.array([[7, 8, 9], [7, 8, 9], [3, 2, 1], [3, 2, 1]]),
                   columns=['c', 'b', 'a'])

    c   b   a
0   7   8   9
1   7   8   9
2   3   2   1
3   3   2   1

where the values [1, 2, 3] are repeated in both but [4, 5, 6] and [9, 8, 7] is not.

What I want it to have all the values from one dataframe that doesn't join the other dataframe. For example something like :

A some_left_outer_join B = C

C = pd.DataFrame(np.array([ [4, 5, 6], [4, 5, 6]]),
                   columns=['a', 'b', 'c'])

and to have all the values from both dataframes that doesn't join the other dataframe. For example something like :

A some_outer_join B = D

D = pd.DataFrame(np.array([ [4, 5, 6], [4, 5, 6] , [9, 8, 7] , [9, 8, 7]]),
                   columns=['a', 'b', 'c'])

Trying

 (pd.merge(left=A, right=B, how='left', on=['a', 'b', 'c']))

    a   b   c
0   1   2   3
1   1   2   3
2   4   5   6
3   1   2   3
4   1   2   3
5   4   5   6

give me the joined and not joined elements. I only want the not joined elements. Please, how can I get the desiere elements?

Upvotes: 3

Views: 2219

Answers (2)

Rajesh Bhat
Rajesh Bhat

Reputation: 1000

One trick is to add a dummy columns to A and B. So in the non-matched rows, you will get NaN's in the dummy columns. You can drop the dummy column later

B['d'] = 0
A['e'] = 0
AB = pd.merge(left=A, right=B, how='outer', on=['a', 'b', 'c'])
C = AB[AB.d.apply(pd.isnull)]
D = AB[(AB.d.apply(pd.isnull)) | (AB.e.apply(pd.isnull))]

C

    a   b   c   e   d
4   4   5   6   0.0 NaN
5   4   5   6   0.0 NaN

D
    a   b   c   e   d
4   4   5   6   0.0 NaN
5   4   5   6   0.0 NaN
6   9   8   7   NaN 0.0
7   9   8   7   NaN 0.0

Upvotes: 0

jezrael
jezrael

Reputation: 862681

You can use parameter indicator=True with outer join and then filter by boolean indexing with Series.eq for == and Series.ne for !=:

df = (pd.merge(left=A, right=B, how='outer', on=['a', 'b', 'c'], indicator=True))
print (df)
   a  b  c      _merge
0  1  2  3        both
1  1  2  3        both
2  1  2  3        both
3  1  2  3        both
4  4  5  6   left_only
5  4  5  6   left_only
6  9  8  7  right_only
7  9  8  7  right_only

C = df[df['_merge'].eq('left_only')]
print (C)
   a  b  c     _merge
4  4  5  6  left_only
5  4  5  6  left_only

D = df[df['_merge'].ne('both')]
print (D)
   a  b  c      _merge
4  4  5  6   left_only
5  4  5  6   left_only
6  9  8  7  right_only
7  9  8  7  right_only

If want also remove column:

s = df.pop('_merge')
C = df[s.eq('left_only')]
print (C)
   a  b  c
4  4  5  6
5  4  5  6

D = df[s.ne('both')]
print (D)
   a  b  c
4  4  5  6
5  4  5  6
6  9  8  7
7  9  8  7

Upvotes: 1

Related Questions