Reputation: 371
I have to compare data present in the two excel sheets and insert the difference in the third sheet.
I am aware that I have to do the comparison twice i.e. file1!=file2
and file2!=file1
.
Q1:
Is there a way that only one step finds the difference in the two files?
So far I have written the below code and its working also:
import pandas as pd
df1 = pd.read_excel('/Users/Testing/Data1.xlsx')
df2 = pd.read_excel('/Users/Testing/Data2.xlsx')
difference = df1[df1!=df2]
print(difference)
Sample data:
Data1:
S.No ID Name RegionName RegionCode
1. 191 Roger EU 76
2. 334 Martin TT 09
3. 442 Steven YU 89
4. 776 Georgee TT 09
Data2:
S.No ID Name RegionName RegionCode
01. 111 Ralph EU 76
02. 254 Luthur SE 19
03. 564 Todd OO 67
04. 776 Georgee TT 09
Difference I am getting is:
S.No ID Name RegionName RegionCode
01. 191 Roger EU 76
02. 334 Martin TT 09
03. 442 Steven YU 89
04. 776 Georgee TT 09
05. 111 Ralph EU 76
06. 254 Luthur SE 19
07. 564 Todd OO 67
08. 776 Georgee TT 09
Expected result is:
S.No ID Name RegionName RegionCode
01 191 Roger EU 76
02 334 Martin TT 09
03 442 Steven YU 89
04 254 Luthur SE 19
05 564 Todd OO 67
06 111 Ralph EU 76
Q2:
How can I get the above result? The requirement is to search and store the value which is present in only one file(e.g.. Data1) and not in the other(e.g. Data2).
Any help is highly appreciated.
Upvotes: 0
Views: 71
Reputation: 14093
you can concat and drop_duplicates:
new_df = pd.concat([df1,df2])
new_df.drop_duplicates(keep=False)
S.No ID Name RegionName RegionCode
0 1.0 191 Roger EU 76
1 2.0 334 Martin TT 9
2 3.0 442 Steven YU 89
0 1.0 111 Ralph EU 76
1 2.0 254 Luthur SE 19
2 3.0 564 Todd OO 67
Upvotes: 2