Keithx
Keithx

Reputation: 3148

Pandas analogue of JOIN with WHERE clause

I'm doing joining of two dataframe (A and B) in python's pandas.

The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)

In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):

x=pd.merge(A,B,how='right',on=['client_id','client_id']

Upvotes: 10

Views: 17571

Answers (2)

Quickbeam2k1
Quickbeam2k1

Reputation: 5437

For me, this is also a bit unsatisfying, but I think the recommended way is something like:

x = pd.merge(A[A["client_ID"].isnull()], B, 
             how='right', on=['client_id', 'client_id'])

More information can be found in the pandas documentation

Additionally, you might use something like A.where(A["client_ID"].isnull()) for filtering. Also, note my mistake in the previous version. I was comparing to Nonebut you should use the isnull() function

Upvotes: 7

piRSquared
piRSquared

Reputation: 294228

option 1
indicator=True

A.merge(B, on='client_id', how='right', indicator=True) \
    .query('_merge == "right_only"').drop('_merge', 1)

setup

A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6]))
B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))

results

enter image description here

more explanation
indicator=True puts another column in the results of the merge that indicates whether that rows results are from the left, right, or both.

A.merge(B, on='client_id', how='outer', indicator=True)

enter image description here

So, I just use query to filter out the right_only indicator then drop that column.


option 2
not really a merge. You can use query again to only pull rows of B where its 'client_id's are not in A

B.query('client_id not in @A.client_id')

or an equivalent way of saying the same thing (but faster)

B[~B.client_id.isin(A.client_id)]

enter image description here

Upvotes: 7

Related Questions