Reputation: 3148
I'm doing joining of two dataframe (A and B) in python's pandas.
The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)
In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):
x=pd.merge(A,B,how='right',on=['client_id','client_id']
Upvotes: 10
Views: 17571
Reputation: 5437
For me, this is also a bit unsatisfying, but I think the recommended way is something like:
x = pd.merge(A[A["client_ID"].isnull()], B,
how='right', on=['client_id', 'client_id'])
More information can be found in the pandas documentation
Additionally, you might use something like A.where(A["client_ID"].isnull())
for filtering. Also, note my mistake in the previous version. I was comparing to None
but you should use the isnull()
function
Upvotes: 7
Reputation: 294228
option 1
indicator=True
A.merge(B, on='client_id', how='right', indicator=True) \
.query('_merge == "right_only"').drop('_merge', 1)
setup
A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6]))
B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))
results
more explanation
indicator=True
puts another column in the results of the merge that indicates whether that rows results are from the left, right, or both.
A.merge(B, on='client_id', how='outer', indicator=True)
So, I just use query
to filter out the right_only
indicator then drop that column.
option 2
not really a merge. You can use query
again to only pull rows of B
where its 'client_id'
s are not in A
B.query('client_id not in @A.client_id')
or an equivalent way of saying the same thing (but faster)
B[~B.client_id.isin(A.client_id)]
Upvotes: 7