How to compare two different structured excel in python

Question

Let say i have file1.xlsx and file2.xlsx with columns ['id','name','role','gender'] and ['id','country','role'] respectively. I am trying to retrieve all rows having same role with their respective ids in a new excel file. I tried below code to achieve that -

import pandas

file1 = file1.xlsx
file2 = file2.xlsx
data1 = pandas.read_excel(file1)
data2 = pandas.read_excel(file2)
dataDiff = data1[data1 != data2]
writer = pandas.ExcelWriter('result.xlsx')
dataDiff.to_excel(writer, sheet_name='Sheet1',index=false)
writer.save()

But this block of is giving me below error

Exception: Can only compare identically-labeled Dataframe objects'

I am still a newbie in python. Any help would be appreciated. Can we use anything other than pandas?

ansev · Accepted Answer

We can use Series.isin to performance a boolean indexing:

mask=data1.role.isin(data2.role.tolist())
data_equal=data1[mask]

data_diff=data1[~mask]

We can also use DataFrame.merge as suggested @Jon Clements:

data_equal=data.merge(data2,on='role',how='inner')

or

data_diff= ( data1.merge(data2,on='role',how='outer',indicator=True)     
                  .query('_merge == "left_only"')
                  .drop('_merge',axis=1) )

How to compare two different structured excel in python

Answers (1)

Related Questions