Reputation: 295
I'm trying to import from Excel into a dataframe and maintain a multi indexed format.
This import is fine:
def import_cp(cp_sheet_name):
xl = pd.ExcelFile('FileNameA.xlsx')
df_first = xl.parse(cp_sheet_name)
df_second = xl.parse(cp_sheet_name)
# there's many more
return df_first, df_second
df_first = import_cp("Sheet 1")
The Excel is formatted like this:
| | Alpha | Bravo | Charlie |
|Position| Area | Gain | Area | Gain | Area | Gain |
| 1 | 0.5 | 1.1 | 0.5 | 1.1 | 1.7 | 1.6 |
| 2 | 0.6 | 1.0 | 0.6 | 1.0 | 1.5 | 1.4 |
Where the Alpha Bravo cells are merged.
When I import I get:
( |Unnamed: 0 Alpha| Unnamed: 2 Bravo| Unnamed: 4 Charlie|
0 |Position Area| Gain Area | Gain Area |
1 | 1 0.5 | 1.17 0.5 | 1.13 0.5 |
2 | 2 0.5 | 1.17 0.5 | 1.13 0.5 |
I tried using header=0
however that did not have change much and a fillna
is not ideal as I don't want Alpha Alpha Bravo Bravo Charlie Charlie
Any help on this would be appreciated.
Upvotes: 3
Views: 2563
Reputation: 863256
I think you need add parameter header=[0,1]
to read_excel
for reading columns to MultiIndex
, index_col=0
for reading first column to index
and sheetname='sheet1'
for reading sheet
with name sheet1
. Then you can reset columns names by rename_axis
(new in pandas
0.18.0
)
import pandas as pd
df = pd.read_excel('test.xlsx', header=[0,1], index_col=0, sheetname='sheet1')
print df
Alpha Bravo Charlie
Position Area Gain Area Gain Area Gain
1 0.5 1.1 0.5 1.1 1.7 1.6
2 0.6 1.0 0.6 1.0 1.5 1.4
df = df.rename_axis((None,None), axis=1)
print df
Alpha Bravo Charlie
Area Gain Area Gain Area Gain
1 0.5 1.1 0.5 1.1 1.7 1.6
2 0.6 1.0 0.6 1.0 1.5 1.4
print df.index
Int64Index([1, 2], dtype='int64')
print df.columns
MultiIndex(levels=[[u'Alpha', u'Bravo', u'Charlie'],
[u'Area', u'Gain']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
Upvotes: 2