Reputation: 3
I have a data frame (df) with multi column headers:
enter image description here
yearQ YearC YearS Type1 Type2
index City State Year1 Year2 Year3 Year4 Year5 Year6
0 New York NY 355 189 115 234 178 422
1 Los Angeles CA 100 207 298 230 214 166
2 Chicago IL 1360 300 211 121 355 435
3 Philadelphia PA 270 156 455 232 532 355
4 Phoenix AZ 270 234 112 432 344 116
I want to count the average number for each type. the final format should be like the following:
City State Type1 Type2
New York NY avg of(355+189+115) avg of (234+178+422)
.......
Can anybody give me a hint? Many thanks. Kath
Upvotes: 0
Views: 302
Reputation: 862611
I think you can use groupby
by first level of Multindex
in columns with aggregate sum
:
print (df.index)
MultiIndex(levels=[[0, 1, 2, 3, 4],
['Chicago', 'Los Angeles', 'New York', 'Philadelphia', 'Phoenix'],
['AZ', 'CA', 'IL', 'NY', 'PA']],
labels=[[0, 1, 2, 3, 4], [2, 1, 0, 3, 4], [3, 1, 2, 4, 0]])
print (df.columns)
MultiIndex(levels=[['Type1', 'Type2'],
['Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5]],
names=['YearQ', 'index'])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ Type1 Type2
0 New York NY 659 834
1 Los Angeles CA 605 610
2 Chicago IL 1871 911
3 Philadelphia PA 881 1119
4 Phoenix AZ 616 892
But maybe before is necessary set_index
:
print (df.index)
Int64Index([0, 1, 2, 3, 4], dtype='int64')
print (df.columns)
MultiIndex(levels=[['Type1', 'Type2', 'YearC', 'YearS'],
['City', 'State', 'Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
labels=[[2, 3, 0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7]],
names=['YearQ', 'index'])
df = df.set_index([('YearC','City'), ('YearS','State')])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ Type1 Type2
(YearC, City) (YearS, State)
New York NY 659 834
Los Angeles CA 605 610
Chicago IL 1871 911
Philadelphia PA 881 1119
Phoenix AZ 616 892
Upvotes: 2