petercheung
petercheung

Reputation: 79

Python P andas compare two excel files and export

I would like to compare to excel file and output to a new excel file for import to our stupid ERP, to improve my workload but I have some question

my excel file like this, around 17 columns NAME DESCRIPTION Our REF TEAM ADDRESS LINE1 ... ........ only ADDRESS LINE1 ADDRESS LINE2 and tel need to compare' because the ADDRESS and tel will change by 3rd party

The excel has 2 worksheets named cust code and vendor code, how to compare two excel with 2 worksheets

         #compare
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,index=False,inplace=True)
todaySet = pd.read_excel('today.xlsx', sheet_name =0,index=False, inplace=True)    
difference = todaySet[todaySet!=lastdaySet]
 print (difference)
#Output excel
        ....

but has error Output error

> Traceback (most recent call last):   File
> "C:/Users/SupportTeamA/PycharmProjects/untitled2/excelhandel.py", line 74,
> in <module>
>     difference = todaySet[todaySet != lastdaySet]   File "C:\Users\SupportTeamA\PycharmProjects\untitled2\venv\lib\site-packages\pandas\core\ops\__init__.py",
> line 838, in f
>     raise ValueError( ValueError: Can only compare identically-labeled DataFrame objects

Upvotes: 0

Views: 1757

Answers (2)

Demo Creation
Demo Creation

Reputation: 1

the error because you read all of sheets you need only specific column can you use the following code i hope to help you

import pandas as pd
`import os
  #compare
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,usecols =['column_name1','column_name2'])
todaySet = pd.read_excel('today.xlsx', sheet_name =0,usecols =['column_name1','column_name2'])    
difference = todaySet[todaySet!=lastdaySet]
 print (difference)
#Output excel
difference.to_excel("output.xlsx")
os.startfile('"output.xlsx"')

Upvotes: 0

arshad
arshad

Reputation: 168

It does not compare if the number of rows in the two sheets is different and gives same error. Here is a workaround when todaySet contains greater number of rows than lastdaySet:

import pandas as pd

lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,index=False,inplace=True)
todaySet = pd.read_excel('today.xlsx', sheet_name =0,index=False, inplace=True)

lenToday = len (todaySet.index)
lenLastday = len(lastdaySet.index)
if lenToday > lenLastday:
    temp = todaySet.iloc[:lenLastday]
    difference = temp[temp!=lastdaySet]

    remaining = todaySet.iloc[lenLastday:]
    difference = pd.concat([difference, remaining])  #.drop_duplicates().reset_index(drop=True)
    difference = difference.reset_index(drop=True)
else:
    difference = todaySet[todaySet!=lastdaySet]
print ('Final Diff:')
print (difference)

For comparison of Sheet2, you'll set sheet_name = 1.

Upvotes: 2

Related Questions