Reputation: 533
I have a data frame with multindex, i want to convert it to a pivot table, do summarize on the columns, the data are:
import random
import pandas as pd
arrays = [[2,2,3,3,3,4,4,4,4,5,5,7,7],
[1,2,1,2,3,1,2,3,4,1,3,1,4]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names = ['first','second'])
data = pd.Series(random.sample(range(1,100),13), index = index)
data
first second
2 1 28
2 20
3 1 7
2 6
3 86
4 1 10
2 30
3 8
4 44
5 1 74
3 65
7 1 12
4 72
dtype: int64
I want to convert it to (inner value is sum of the column values):
second==1 second > 1
first
2 28 20
3 7 92
4 10 38
5 74 65
7 1 72
Is there an elegant way of doing this?
Thanks!
Upvotes: 2
Views: 984
Reputation: 7326
Modify your specific level index and restructure the data using groupby
. set_levels()
and get_level_values()
are useful when you modify your specific level index of pandas multiIndex.
data.index = data.index.set_levels(data.index.get_level_values(1).map(lambda x: 'second = 1' if x == 1 else 'second > 1'), level=1)
print data.unstack().fillna(0).groupby(axis=1, level=0).sum()
second second = 1 second > 1
first
2 44.0 46.0
3 110.0 31.0
4 63.0 150.0
5 74.0 0.0
7 7.0 86.0
Upvotes: 0
Reputation: 294508
import random
import pandas as pd
random.seed(314)
arrays = [[2,2,3,3,3,4,4,4,4,5,5,7,7],
[1,2,1,2,3,1,2,3,4,1,3,1,4]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names = ['first','second'])
data = pd.Series(random.sample(range(1,100),13), index = index)
data
first second
2 1 20
2 12
3 1 1
2 63
3 24
4 1 21
2 55
3 45
4 18
5 1 11
3 25
7 1 3
4 26
dtype: int64
def eq_one(x):
values = [x.ix[1], x.sum() - x.ix[1]]
index = ['second==1', 'second > 1']
return pd.Series(values, index=index)
data.unstack().apply(eq_one, axis=1)
second==1 second > 1
first
2 20.0 12.0
3 1.0 87.0
4 21.0 118.0
5 11.0 25.0
7 3.0 26.0
Upvotes: 1