Qin Wu
Qin Wu

Reputation: 1

Compare two excel files using python

I have two xlsx files as follows:

value1        value2   
3900162750    10    
3900163003    19
2311009200    22 

value1        value2   
3900163003    5    
3900162750    9
2311009200    88

How do I match value1 from xlsx1 with xlsx2 and compare related value2?

for instance, match 3900163003 from xlsx1 with xlsx2 and find out value2 is decreasing.

Upvotes: 0

Views: 543

Answers (2)

hpchavaz
hpchavaz

Reputation: 1388

Use pandas and pandas merge :

import pandas as pd

df = df1.merge(df2, on='value1', how= 'outer', suffixes=('_df1', '_df2'))
df['decrease'] = df['value2_df2'] < df['value2_df1']

giving:

       value1  value2_df1  value2_df2  decrease
0  3900162750          10           9      True
1  3900163003          19           5      True
2  2311009200          22          88     False

Notes:

  • how parameter used to provide some output if df1 and df2 are not perfectly matching
  • suffixes parameter used to provide the origin of the data
  • it could be suitable to give a other name to the 'decrease' column to remember the way it is computed

Data from :

data1 = """
value1        value2   
3900162750    10    
3900163003    19
2311009200    22 
"""
data2 = """
value1        value2   
3900163003    5    
3900162750    9
2311009200    88
"""
df1 = pd.read_csv(io.StringIO(data1), sep=r' +')
df2 = pd.read_csv(io.StringIO(data2), sep=r' +')

In real live read_csv should be replaced by read_excel

Upvotes: 0

user7864386
user7864386

Reputation:

You can use pandas. First read the excel files as pandas.DataFrame objects. Then using pd.DataFrame.merge method, merge the two dataframes on 'value1' column. Then create another column 'Decreasing' by comparing the two 'value2' columns. By default, the column from the left (in this case, df1) gets suffix _x and the one from the right (in this case, df2) gets suffix _y:

import pandas as pd
df1 = pd.read_excel('first_excel_file.xlsx')
df2 = pd.read_excel('second_excel_file.xlsx')
df = df1.merge(df2, on='value1')
df['Decreasing'] = df['value2_x'] > df['value2_y']

Output:

       value1  value2_x  value2_y  Decreasing
0  3900162750        10         9        True
1  3900163003        19         5        True
2  2311009200        22        88       False

Upvotes: 1

Related Questions