Reputation: 5
I'm working with two dataframes in pandas:
DF1: Product_ID, Num_Reviews
DF2: Product_ID, Reviewer_ID, Review_Score
I want to remove or filter DF2 to only contain entries with a Product_ID that exists in DF1. I'm not very familiar with pandas or even python for that matter, and couldn't find a clear way to check if a dataframe includes a key and filter based on that.
Thanks!
Upvotes: 0
Views: 285
Reputation: 64298
The most efficient way to calculate the intersection of Product_ID's would be using numpy's in1d
. That gives you a mask.
Then, you simply slice your DF2 using the mask to get the new dataframe you want.
import numpy as np
mask = ~np.in1d(DF2.Product_ID, DF1.Product_ID)
DF2 = DF2[mask]
Upvotes: 0
Reputation: 76917
Here's on way to do it.
df2[df2['Product_ID'].isin(df1['Product_ID'].unique())]
Get unique Product_ID
from df1
and filter those values in df2['Product_ID']
using isin()
Upvotes: 1