How to find all records with the same ID between two DataFrames?

Question

I have two DataFrames with movie review data from two different platforms (id, title, review, etc). All rows about a particular movie need to be removed from one DataFrame if that movie has not been reviewed in the other DataFrame. Here's an example:

import pandas as pd 

data1 = [[1, 'Great movie!', 'Spiderman'], [1, 'Not my preference', 'Spiderman'], [2, 'Just average...', 'Captain America'], [4, 'Tolerable', 'Avengers']]

data2 = [[1, 'Did not think much of this', 'Spiderman'], [2, 'Great in my opinion!', 'Captain America'], [3, 'Could not finish', 'Batman Returns']]

df1 = pd.DataFrame(data1, columns = ['id', 'review', 'movie title'])

df2 = pd.DataFrame(data2, columns = ['id', 'review', 'movie title'])

df1.insert(3, "isValid", pd.Series(df1.id.isin(df2.id).values.astype(bool)))
df1 = df1[df1.isValid != False]

I'm wondering if there's a more efficient way to do this?

Thanks in advance for any help!

SmileyProd · Accepted Answer

If you want to have the information in df1 of 'isValid' you can do this:

df1["isValid"] = df1.id.isin(df2.id)
new_df = df1.loc[df1.isValid == True]

   id             review      movie title  isValid
0   1       Great movie!        Spiderman     True
1   1  Not my preference        Spiderman     True
2   2    Just average...  Captain America     True

But if you don't care about 'isValid' and just used it in your answer for selection you can simply do this:

new_df = df1.loc[df1.id.isin(df2.id)]

   id             review      movie title
0   1       Great movie!        Spiderman
1   1  Not my preference        Spiderman
2   2    Just average...  Captain America

How to find all records with the same ID between two DataFrames?

Answers (2)

Related Questions