Reputation: 21
I have two excel
Excel 1 :
A,B,C
1,2,3
Excel 2 :
A,C,B
1,3,2
How can i re position the excel 2 base on excel 1 column ?
so that A,C,B and become A,B,C
I use the following code to check column orders:
comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
pass
else:
print('Wrong column order !!!!! ')
Upvotes: 0
Views: 76
Reputation: 130
This code snippet will work fine:
def areColumnSame(df1, df2, checkTypes = True):
if checkTypes:
type1 = dict(df1.dtypes)
type2 = dict(df2.dtypes)
return type1 == type2
else:
col1 = list(df1.columns)
col2 = list(df2.columns)
col1.sort()
col2.sort()
return col1 == col2
To show how the above code works let us explore examples:
Consider three excel files:
| A | B | C |
|---|---|---|
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| A | C | B |
|---|---|---|
| 1 | 3 | 2 |
| 4 | 6 | 5 |
| A | B | C | A.1 | B.1 | C.1 |
|---|---|---|-----|-----|-----|
| 1 | 2 | 3 | 1 | 2 | 3 |
| 4 | 5 | 6 | 4 | 5 | 6 |
Now for the first file the dict(df.dtypes)
is shown below:
{'A': dtype('int64'),
'B': dtype('int64'),
'C': dtype('int64')}
Similarly for other two files:
{'A': dtype('int64'),
'C': dtype('int64'),
'B': dtype('int64')}
and
{'A': dtype('int64'),
'B': dtype('int64'),
'C': dtype('int64'),
'A.1': dtype('int64'),
'B.1': dtype('int64'),
'C.1': dtype('int64')}
We just need to compare these dictionaries to get the result. At the same time, it also checks for the type of data.
Hence for the comparison between the first two files will be true whereas the comparison with third will return false.
But you can always disable the type-checking in which case we will just check whether [A, B, C]
is the same as [A, C, B]
without comparing their types.
Upvotes: 0
Reputation: 14094
df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)
if all(df1.columns == df2.columns):
pass
else:
df1 = df1[df2.columns]
Upvotes: 1
Reputation: 13437
It doesn't really matter if the data comes from excel or other format. If you know that both have the same columns up to order you could just
import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])
print(df1[df0.columns])
A B C
0 1 2 3
Upvotes: 0