Reputation: 870
Let say i have file1.xlsx and file2.xlsx with columns ['id','name','role','gender'] and ['id','country','role'] respectively. I am trying to retrieve all rows having same role with their respective ids in a new excel file. I tried below code to achieve that -
import pandas
file1 = file1.xlsx
file2 = file2.xlsx
data1 = pandas.read_excel(file1)
data2 = pandas.read_excel(file2)
dataDiff = data1[data1 != data2]
writer = pandas.ExcelWriter('result.xlsx')
dataDiff.to_excel(writer, sheet_name='Sheet1',index=false)
writer.save()
But this block of is giving me below error
Exception: Can only compare identically-labeled Dataframe objects'
I am still a newbie in python. Any help would be appreciated. Can we use anything other than pandas?
Upvotes: 1
Views: 99
Reputation: 30920
We can use Series.isin
to performance a boolean indexing
:
mask=data1.role.isin(data2.role.tolist())
data_equal=data1[mask]
data_diff=data1[~mask]
We can also use DataFrame.merge
as suggested @Jon Clements:
data_equal=data.merge(data2,on='role',how='inner')
or
data_diff= ( data1.merge(data2,on='role',how='outer',indicator=True)
.query('_merge == "left_only"')
.drop('_merge',axis=1) )
Upvotes: 1