Chet Meinzer
Chet Meinzer

Reputation: 1741

how to do right join where key is null in python pandas

Is it possible to do a right join where key is null in python pandas. That is, can I join DateFrames to produce only values from the right that do not match the left?

Upvotes: 5

Views: 8532

Answers (4)

Garrett
Garrett

Reputation: 49886

I think this is best expressed as an index selection operation. To find all indices in one frame and not in another, try using the - operator on the two Dataframe index objects, as if the index objects were built-in python set objects. for example:

In [1]: dfa = pd.DataFrame({'A': range(5)}, index=range(5))

In [2]: dfb = pd.DataFrame({'A': range(10, 15)}, index=range(3,8))

In [3]: dfa
Out[3]: 
   A
0  0
1  1
2  2
3  3
4  4

In [4]: dfb
Out[4]: 
    A
3  10
4  11
5  12
6  13
7  14

In [5]: dfb.loc[set(dfb.index) - set(dfa.index)]
Out[5]: 
    A
5  12
6  13
7  14

Upvotes: 1

banderlog013
banderlog013

Reputation: 2503

Use https://pandas.pydata.org/docs/reference/api/pandas.Index.difference.html#pandas.Index.difference

import pandas a pd


dfa = pd.DataFrame({'A': range(5)}, index=range(5))
Out[13]:
   A
0  0
1  1
2  2
3  3
4  4


dfb = pd.DataFrame({'A': range(10, 15)}, index=range(3,8))
Out[14]:
    A
3  10
4  11
5  12
6  13
7  14

dfb[dfb.index.difference(dfa.index)]
Out[15]:
    A
5  12
6  13
7  14

Upvotes: 0

Ichta
Ichta

Reputation: 308

A more compact solution with merge:

df = dfa.merge(dfb, indicator='i', how='outer').query('i == "right_only"').drop('i', 1)
print (df)

Solution taken from here:

https://stackoverflow.com/a/49487564/9539462

Upvotes: 1

jgaw
jgaw

Reputation: 1734

You could make a dummy column of ones in each dataframe, use pd.merge() to right join the dataframes, and then use a mask to filter out the nulls.

In [1]: dfa = pd.DataFrame({'A': range(4, 8)})
In [2]: dfb = pd.DataFrame({'A': range(6, 10)})
In [3]: dfa['ones'] = 1
In [4]: dfb['ones'] = 1
In [5]: dfa 
Out[5]: 
   A  ones
0  4  1
1  5  1
2  6  1
3  7  1

In [6]: dfb 
Out[6]: 
   A  ones
0  6  1
1  7  1
2  8  1
3  9  1

In [7]: df = pd.merge(dfa, dfb, on = 'A', how = 'right')
In [8]: df
Out[8]: 
    A   ones_x  ones_y
0   6   1       1
1   7   1       1
2   8   NaN     1
3   9   NaN     1

In [8]: df[df.ones_x.notnull()][['A']]
Out[8]: 
    A
0   6
1   7

Upvotes: 1

Related Questions