Reputation: 79
I would like to compare to excel file and output to a new excel file for import to our stupid ERP, to improve my workload but I have some question
my excel file like this, around 17 columns NAME DESCRIPTION Our REF TEAM ADDRESS LINE1 ... ........
only ADDRESS LINE1 ADDRESS LINE2 and tel need to compare' because the ADDRESS and tel will change by 3rd party
The excel has 2 worksheets named cust code and vendor code, how to compare two excel with 2 worksheets
#compare
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,index=False,inplace=True)
todaySet = pd.read_excel('today.xlsx', sheet_name =0,index=False, inplace=True)
difference = todaySet[todaySet!=lastdaySet]
print (difference)
#Output excel
....
but has error Output error
> Traceback (most recent call last): File
> "C:/Users/SupportTeamA/PycharmProjects/untitled2/excelhandel.py", line 74,
> in <module>
> difference = todaySet[todaySet != lastdaySet] File "C:\Users\SupportTeamA\PycharmProjects\untitled2\venv\lib\site-packages\pandas\core\ops\__init__.py",
> line 838, in f
> raise ValueError( ValueError: Can only compare identically-labeled DataFrame objects
Upvotes: 0
Views: 1757
Reputation: 1
the error because you read all of sheets you need only specific column can you use the following code i hope to help you
import pandas as pd
`import os
#compare
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,usecols =['column_name1','column_name2'])
todaySet = pd.read_excel('today.xlsx', sheet_name =0,usecols =['column_name1','column_name2'])
difference = todaySet[todaySet!=lastdaySet]
print (difference)
#Output excel
difference.to_excel("output.xlsx")
os.startfile('"output.xlsx"')
Upvotes: 0
Reputation: 168
It does not compare if the number of rows in the two sheets is different and gives same error. Here is a workaround when todaySet contains greater number of rows than lastdaySet:
import pandas as pd
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,index=False,inplace=True)
todaySet = pd.read_excel('today.xlsx', sheet_name =0,index=False, inplace=True)
lenToday = len (todaySet.index)
lenLastday = len(lastdaySet.index)
if lenToday > lenLastday:
temp = todaySet.iloc[:lenLastday]
difference = temp[temp!=lastdaySet]
remaining = todaySet.iloc[lenLastday:]
difference = pd.concat([difference, remaining]) #.drop_duplicates().reset_index(drop=True)
difference = difference.reset_index(drop=True)
else:
difference = todaySet[todaySet!=lastdaySet]
print ('Final Diff:')
print (difference)
For comparison of Sheet2, you'll set sheet_name = 1
.
Upvotes: 2