Benjamin S
Benjamin S

Reputation: 5

Python pandas- Remove all elements of one dataframe that aren't included in another

I'm working with two dataframes in pandas:

DF1: Product_ID, Num_Reviews

DF2: Product_ID, Reviewer_ID, Review_Score

I want to remove or filter DF2 to only contain entries with a Product_ID that exists in DF1. I'm not very familiar with pandas or even python for that matter, and couldn't find a clear way to check if a dataframe includes a key and filter based on that.

Thanks!

Upvotes: 0

Views: 285

Answers (2)

shx2
shx2

Reputation: 64298

The most efficient way to calculate the intersection of Product_ID's would be using numpy's in1d. That gives you a mask.

Then, you simply slice your DF2 using the mask to get the new dataframe you want.

import numpy as np
mask = ~np.in1d(DF2.Product_ID, DF1.Product_ID)
DF2 = DF2[mask]

Upvotes: 0

Zero
Zero

Reputation: 76917

Here's on way to do it.

df2[df2['Product_ID'].isin(df1['Product_ID'].unique())]

Get unique Product_ID from df1 and filter those values in df2['Product_ID'] using isin()

Upvotes: 1

Related Questions