OParker
OParker

Reputation: 295

Import from Excel into Multi Indexed dataframe

I'm trying to import from Excel into a dataframe and maintain a multi indexed format.

This import is fine:

def import_cp(cp_sheet_name):
    xl = pd.ExcelFile('FileNameA.xlsx')
    df_first = xl.parse(cp_sheet_name)
    df_second = xl.parse(cp_sheet_name)
    # there's many more
    return df_first, df_second

df_first = import_cp("Sheet 1")

The Excel is formatted like this:

|        |       Alpha       |       Bravo      |    Charlie     |
|Position|  Area   |  Gain   |   Area  |  Gain  |  Area  |  Gain |
|    1   |   0.5   |   1.1   |    0.5  |  1.1   |   1.7  |  1.6  |
|    2   |   0.6   |   1.0   |    0.6  |  1.0   |   1.5  |  1.4  |

Where the Alpha Bravo cells are merged.

When I import I get:

(  |Unnamed: 0 Alpha| Unnamed: 2 Bravo| Unnamed: 4 Charlie|
0  |Position    Area|   Gain    Area  |  Gain    Area     |
1  |    1     0.5   |   1.17    0.5   |    1.13     0.5   |
2  |    2     0.5   |   1.17    0.5   |    1.13     0.5   |

I tried using header=0 however that did not have change much and a fillna is not ideal as I don't want Alpha Alpha Bravo Bravo Charlie Charlie

Any help on this would be appreciated.

Upvotes: 3

Views: 2563

Answers (1)

jezrael
jezrael

Reputation: 863256

I think you need add parameter header=[0,1] to read_excel for reading columns to MultiIndex, index_col=0 for reading first column to index and sheetname='sheet1' for reading sheet with name sheet1. Then you can reset columns names by rename_axis (new in pandas 0.18.0)

import pandas as pd

df = pd.read_excel('test.xlsx', header=[0,1], index_col=0, sheetname='sheet1')
print df
         Alpha      Bravo      Charlie     
Position  Area Gain  Area Gain    Area Gain
1          0.5  1.1   0.5  1.1     1.7  1.6
2          0.6  1.0   0.6  1.0     1.5  1.4

df = df.rename_axis((None,None), axis=1)
print df
  Alpha      Bravo      Charlie     
   Area Gain  Area Gain    Area Gain
1   0.5  1.1   0.5  1.1     1.7  1.6
2   0.6  1.0   0.6  1.0     1.5  1.4

print df.index
Int64Index([1, 2], dtype='int64')

print df.columns
MultiIndex(levels=[[u'Alpha', u'Bravo', u'Charlie'],
                   [u'Area', u'Gain']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

Upvotes: 2

Related Questions