Greedy Coder
Greedy Coder

Reputation: 97

Compare dataframes based on row values in Pandas

What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?

For example: Here's a sample from dataset one -

#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10

And a sample from dataframe two -

#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30

So if I wanted to compare all rows in both datasets where there is a Shirt row.

what is the best way to do it?

I'm using Pandas/Python3.7

Thanks!

Upvotes: 0

Views: 172

Answers (3)

BENY
BENY

Reputation: 323226

Filter after merge

df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]: 
   Row_x   Item Color_x Price_x  Row_y Color_y Price_y
0      1  Shirt     Red     $30      4     Red     $30

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34056

You can simply use pandas merge like this:

pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')

This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.

Let me know if this is what you want.

Upvotes: 1

anky
anky

Reputation: 75080

check this example:

DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})

    c1  c2  c3
0   abc xyz 0
1   abc mno 0
2   iop yut 0
3   iop trg 0


DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
    c1  c2
0   iop yut
1   abc mno
2   bhj uio
3   iop yut
4   xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)

    c1  c2  c3
0   abc mno 0
1   iop yut 0
2   iop yut 0

So by using pd.merge you can get the matching rows

Upvotes: 1

Related Questions