Sapling
Sapling

Reputation: 533

multiindex dataframe to pivot table with a new column

I have a data frame with multindex, i want to convert it to a pivot table, do summarize on the columns, the data are:

import random
import pandas as pd
arrays = [[2,2,3,3,3,4,4,4,4,5,5,7,7],
      [1,2,1,2,3,1,2,3,4,1,3,1,4]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names = ['first','second'])
data = pd.Series(random.sample(range(1,100),13), index = index)
data

first  second
2      1         28
       2         20
3      1          7
       2          6
       3         86
4      1         10
       2         30
       3          8
       4         44
5      1         74
       3         65
7      1         12
       4         72
dtype: int64

I want to convert it to (inner value is sum of the column values):

      second==1    second > 1
first
2      28          20
3      7           92
4      10          38
5      74          65
7      1           72

Is there an elegant way of doing this?

Thanks!

Upvotes: 2

Views: 984

Answers (2)

su79eu7k
su79eu7k

Reputation: 7326

Modify your specific level index and restructure the data using groupby. set_levels() and get_level_values() are useful when you modify your specific level index of pandas multiIndex.

data.index = data.index.set_levels(data.index.get_level_values(1).map(lambda x: 'second = 1' if x == 1 else 'second > 1'), level=1)
print data.unstack().fillna(0).groupby(axis=1, level=0).sum()

second  second = 1  second > 1
first                         
2             44.0        46.0
3            110.0        31.0
4             63.0       150.0
5             74.0         0.0
7              7.0        86.0

Upvotes: 0

piRSquared
piRSquared

Reputation: 294508

Setup

import random
import pandas as pd
random.seed(314)

arrays = [[2,2,3,3,3,4,4,4,4,5,5,7,7],
      [1,2,1,2,3,1,2,3,4,1,3,1,4]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names = ['first','second'])
data = pd.Series(random.sample(range(1,100),13), index = index)
data

first  second
2      1         20
       2         12
3      1          1
       2         63
       3         24
4      1         21
       2         55
       3         45
       4         18
5      1         11
       3         25
7      1          3
       4         26
dtype: int64

Solution

def eq_one(x):
    values = [x.ix[1], x.sum() - x.ix[1]]
    index = ['second==1', 'second > 1']
    return pd.Series(values, index=index)

data.unstack().apply(eq_one, axis=1)

       second==1  second > 1
first                       
2           20.0        12.0
3            1.0        87.0
4           21.0       118.0
5           11.0        25.0
7            3.0        26.0

Upvotes: 1

Related Questions