user3662208
user3662208

Reputation: 3

multilevel columns set as index in pivot_table

I have a data frame (df) with multi column headers:
enter image description here

yearQ   YearC      YearS            Type1                   Type2
index   City       State    Year1   Year2   Year3   Year4   Year5   Year6
0     New York      NY      355     189     115     234     178     422
1     Los Angeles   CA      100     207     298     230     214     166
2     Chicago       IL      1360    300     211     121     355     435
3     Philadelphia  PA      270     156     455     232     532     355
4     Phoenix       AZ      270     234     112     432     344     116

I want to count the average number for each type. the final format should be like the following:

City      State  Type1                 Type2
New York  NY     avg of(355+189+115)   avg of (234+178+422)
.......

Can anybody give me a hint? Many thanks. Kath

Upvotes: 0

Views: 302

Answers (1)

jezrael
jezrael

Reputation: 862611

I think you can use groupby by first level of Multindex in columns with aggregate sum:

print (df.index)
MultiIndex(levels=[[0, 1, 2, 3, 4], 
                   ['Chicago', 'Los Angeles', 'New York', 'Philadelphia', 'Phoenix'], 
                   ['AZ', 'CA', 'IL', 'NY', 'PA']],
           labels=[[0, 1, 2, 3, 4], [2, 1, 0, 3, 4], [3, 1, 2, 4, 0]])

print (df.columns)
MultiIndex(levels=[['Type1', 'Type2'], 
                   ['Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5]],
           names=['YearQ', 'index'])


df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ              Type1  Type2
0 New York     NY    659    834
1 Los Angeles  CA    605    610
2 Chicago      IL   1871    911
3 Philadelphia PA    881   1119
4 Phoenix      AZ    616    892

But maybe before is necessary set_index:

print (df.index)
Int64Index([0, 1, 2, 3, 4], dtype='int64')

print (df.columns)
MultiIndex(levels=[['Type1', 'Type2', 'YearC', 'YearS'], 
                   ['City', 'State', 'Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
           labels=[[2, 3, 0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7]],
           names=['YearQ', 'index'])


df = df.set_index([('YearC','City'), ('YearS','State')])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ                         Type1  Type2
(YearC, City) (YearS, State)              
New York      NY                659    834
Los Angeles   CA                605    610
Chicago       IL               1871    911
Philadelphia  PA                881   1119
Phoenix       AZ                616    892

Upvotes: 2

Related Questions