Reputation: 97
What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?
For example: Here's a sample from dataset one -
#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10
And a sample from dataframe two -
#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30
So if I wanted to compare all rows in both datasets where there is a Shirt row.
what is the best way to do it?
I'm using Pandas/Python3.7
Thanks!
Upvotes: 0
Views: 172
Reputation: 323226
Filter after merge
df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]:
Row_x Item Color_x Price_x Row_y Color_y Price_y
0 1 Shirt Red $30 4 Red $30
Upvotes: 1
Reputation: 34056
You can simply use pandas merge
like this:
pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')
This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.
Let me know if this is what you want.
Upvotes: 1
Reputation: 75080
check this example:
DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})
c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0
DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)
c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0
So by using pd.merge
you can get the matching rows
Upvotes: 1