Yang
Yang

Reputation: 199

Pandas: How to read an excel file defining several columns to be multi indexes?

I have a data frame that every row contains an office location object with several attribute like Global Region,Primary Function, and several energy consumption data as numerical values followed. The names of all columns is like below:

['Global Region',  
'Primary Function',  
'Subsidiaries',  
'T&D Loss Rate Category',  
'Type',  
'Ref',  
'Acquisition date',  
'Disposal date',  
'Corporate Admin Approver',  
'Data Providers',  
'Initiative administrator',  
'Initiative approver',  
'Initiative user',  
'Invoice owner',  
'Apr to Jun 2012',  
'Jul to Sep 2012',  
'Oct to Dec 2012',  
'Jan to Mar 2013',  
'Apr to Jun 2013',  
'Jul to Sep 2013',  
'Oct to Dec 2013',  
'Jan to Mar 2014',  
'Apr to Jun 2014',  
'Jul to Sep 2014',  
'Oct to Dec 2014',  
'Jan to Mar 2015',  
'Apr to Jun 2015',  
'Jul to Sep 2015',  
'Oct to Dec 2015',  
'Jan to Mar 2016']  

How can I sort different locations and view the data based on different attributes, e.g. based on primary function or global region. I can see the average energy consumption or rank energy intensity of all locations whose primary function is R&D.

I thought of multi index, but I didn't know how to do it. I tried this:

test = xls.parse('Sheet1',index_col=['Lenovo Global Region','Primary Function', 'Subsidiaries', 'Type','Acquisition date','Disposal date','Country'])

It didn't work, error said I could only use numbers not string, so I tried this:

test = xls.parse('Sheet1',index_col=0,1,3,4,5,7,9,10)

Still didn't work. Anyone has good suggestions?

Upvotes: 3

Views: 3185

Answers (1)

jezrael
jezrael

Reputation: 863256

You can use read_excel with parameter index_col which contains list of positions of necessary columns:

Sample:

df = pd.read_excel('test.xlsx', sheetname='Sheet1', index_col=[0,1,3])
print (df)
                                                      Subsidiaries Type  Ref
Global Region Primary Function T&D Loss Rate Category                       
1             1                c                                 a    s   10
2             2                c                                 b    d   20
3             3                d                                 c    d   30

excel

Reading a multiindex.

So if add [] it can works:

test = xls.parse('Sheet1',index_col=[0,1,3,4,5,7,9,10])

Upvotes: 2

Related Questions