pynew
pynew

Reputation: 21

how to compare column between two excel in python?

I have two excel

Excel 1 :

A,B,C

1,2,3

Excel 2 :

A,C,B

1,3,2

How can i re position the excel 2 base on excel 1 column ?

so that A,C,B and become A,B,C

I use the following code to check column orders:

comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
    pass
else:
    print('Wrong column order !!!!! ')

Upvotes: 0

Views: 76

Answers (3)

Kaushal Kishore
Kaushal Kishore

Reputation: 130

This code snippet will work fine:

def areColumnSame(df1, df2, checkTypes = True):
    if checkTypes:
        type1 = dict(df1.dtypes)
        type2 = dict(df2.dtypes)
        return type1 == type2

    else:
        col1 = list(df1.columns)
        col2 = list(df2.columns)
        col1.sort()
        col2.sort()
        return col1 == col2

To show how the above code works let us explore examples:

Consider three excel files:

| A | B | C |
|---|---|---|
| 1 | 2 | 3 |
| 4 | 5 | 6 |

| A | C | B |
|---|---|---|
| 1 | 3 | 2 |
| 4 | 6 | 5 |

| A | B | C | A.1 | B.1 | C.1 |
|---|---|---|-----|-----|-----|
| 1 | 2 | 3 | 1   | 2   | 3   |
| 4 | 5 | 6 | 4   | 5   | 6   |

Now for the first file the dict(df.dtypes) is shown below:

{'A': dtype('int64'),
 'B': dtype('int64'),
 'C': dtype('int64')}

Similarly for other two files:

{'A': dtype('int64'),
 'C': dtype('int64'),
 'B': dtype('int64')}

and

{'A': dtype('int64'),
 'B': dtype('int64'),
 'C': dtype('int64'),
 'A.1': dtype('int64'),
 'B.1': dtype('int64'),
 'C.1': dtype('int64')}

We just need to compare these dictionaries to get the result. At the same time, it also checks for the type of data.

Hence for the comparison between the first two files will be true whereas the comparison with third will return false.

But you can always disable the type-checking in which case we will just check whether [A, B, C] is the same as [A, C, B] without comparing their types.

Upvotes: 0

Kenan
Kenan

Reputation: 14094

df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)

if all(df1.columns == df2.columns):
    pass
else:
    df1 = df1[df2.columns]

Upvotes: 1

rpanai
rpanai

Reputation: 13437

It doesn't really matter if the data comes from excel or other format. If you know that both have the same columns up to order you could just

import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])

print(df1[df0.columns])

   A  B  C
0  1  2  3

Upvotes: 0

Related Questions