kishkebab
kishkebab

Reputation: 19

How to find all records with the same ID between two DataFrames?

I have two DataFrames with movie review data from two different platforms (id, title, review, etc). All rows about a particular movie need to be removed from one DataFrame if that movie has not been reviewed in the other DataFrame. Here's an example:

import pandas as pd 

data1 = [[1, 'Great movie!', 'Spiderman'], [1, 'Not my preference', 'Spiderman'], [2, 'Just average...', 'Captain America'], [4, 'Tolerable', 'Avengers']]

data2 = [[1, 'Did not think much of this', 'Spiderman'], [2, 'Great in my opinion!', 'Captain America'], [3, 'Could not finish', 'Batman Returns']]

df1 = pd.DataFrame(data1, columns = ['id', 'review', 'movie title'])

df2 = pd.DataFrame(data2, columns = ['id', 'review', 'movie title'])

df1.insert(3, "isValid", pd.Series(df1.id.isin(df2.id).values.astype(bool)))
df1 = df1[df1.isValid != False]

I'm wondering if there's a more efficient way to do this?

Thanks in advance for any help!

Upvotes: 1

Views: 2497

Answers (2)

SmileyProd
SmileyProd

Reputation: 796

If you want to have the information in df1 of 'isValid' you can do this:

df1["isValid"] = df1.id.isin(df2.id)
new_df = df1.loc[df1.isValid == True]

   id             review      movie title  isValid
0   1       Great movie!        Spiderman     True
1   1  Not my preference        Spiderman     True
2   2    Just average...  Captain America     True

But if you don't care about 'isValid' and just used it in your answer for selection you can simply do this:

new_df = df1.loc[df1.id.isin(df2.id)]

   id             review      movie title
0   1       Great movie!        Spiderman
1   1  Not my preference        Spiderman
2   2    Just average...  Captain America

Upvotes: 3

Onyambu
Onyambu

Reputation: 79388

you are looking for merge function. This will drop all the ones not seen from both df1 and df2.

df1.merge(df2,on=["id","movie title"])
Out: 
   id           review_x      movie title                    review_y
0   1       Great movie!        Spiderman  Did not think much of this
1   1  Not my preference        Spiderman  Did not think much of this
2   2    Just average...  Captain America        Great in my opinion!

your df1 id the id ,review_x ,movie title and df2 is id, movie title review_y

Upvotes: 0

Related Questions