mikebmassey
mikebmassey

Reputation: 8594

How to access multi-index values in Pandas for calculations

I am trying to calculate the distribution of values in a pandas pivot table. I am having trouble calculating the values and understanding the proper way to access multi-index series/columns.

Here is a sample dataframe:

person = ['Jack', 'John', 'Mark']
randoms = ['A', 'B', 'C']

df = pd.DataFrame([], columns=['id', 'person','state', 'choice', 'num'])

for i in range(0,500):
    row = [i, random.choice(person), random.choice(randoms), random.choice(randoms), random.randrange(1, 250)]
    append_df = pd.DataFrame([row], columns=['id', 'person','state', 'choice', 'num'])
    df = df.append(append_df)

df.reset_index(drop=True, inplace=True)

pd.pivot_table(
    data = df,
    index = ['person','choice'],
    columns = ['state'],
    values='num',
    aggfunc = ['sum', 'count'],
    margins=True,
    margins_name='total',
    fill_value=0
)

The output of the pivot table looks something like this:

        sum sum sum sum count   count   count   count
state       A   B   C   total   A   B   C   total
person  choice                              
Jack    A   1519    1667    1460    4646    15  13  11  39
Jack    B   2078    1641    3200    6919    17  12  28  57
Jack    C   2166    1845    3575    7586    13  17  28  58
John    A   3241    2028    1880    7149    26  20  18  64
John    B   2467    2517    1200    6184    21  23  12  56
John    C   1585    2481    2791    6857    16  19  23  58
Mark    A   2320    2647    1858    6825    20  19  18  57
Mark    B   2807    2809    3116    8732    21  24  23  68
Mark    C   1953    1503    2558    6014    11  13  19  43
total       20136   19138   21638   60912   160 160 180 500

What I am trying to calculate is the % of the num value by column/index to get the distribution, like so [The calc would be sum A / sum total, sum B / sum total, etc]:

        sum sum sum sum count   count   count   count               
state       A   B   C   total   A   B   C   total   A_Pct   B_Pct   C_Pct   total_pct
person  choice                                              
Jack    A   1519    1667    1460    4646    15  13  11  39  0.33    0.36    0.31    1.00
Jack    B   2078    1641    3200    6919    17  12  28  57  0.30    0.24    0.46    1.00
Jack    C   2166    1845    3575    7586    13  17  28  58  0.29    0.24    0.47    1.00
John    A   3241    2028    1880    7149    26  20  18  64  0.45    0.28    0.26    1.00
John    B   2467    2517    1200    6184    21  23  12  56  0.40    0.41    0.19    1.00
John    C   1585    2481    2791    6857    16  19  23  58  0.23    0.36    0.41    1.00
Mark    A   2320    2647    1858    6825    20  19  18  57  0.34    0.39    0.27    1.00
Mark    B   2807    2809    3116    8732    21  24  23  68  0.32    0.32    0.36    1.00
Mark    C   1953    1503    2558    6014    11  13  19  43  0.32    0.25    0.43    1.00
total       20136   19138   21638   60912   160 160 180 500 0.33    0.31    0.36    1.00

What's the best way to access the multi-index to calculate this? Is there a standard way to call indexes by name rather than levels?

Upvotes: 0

Views: 83

Answers (1)

Corralien
Corralien

Reputation: 120459

You can use:

# piv = pd.pivot_table(...
pct = piv['sum'].iloc[:, :-1].div(piv['sum'].iloc[:, -1], axis=0)
piv = piv.join(pd.concat([pct, pct.sum(1).rename('total')], 
                         keys=['pct', 'pct'], axis=1))

Output:

>>> piv
                 sum                      count                       pct                          
state              A      B      C  total     A    B    C total         A         B         C total
person choice                                                                                      
Jack   A        2084   1836   2453   6373    19   16   20    55  0.327005  0.288090  0.384905   1.0
       B        2875   1491   1925   6291    24   12   18    54  0.457002  0.237005  0.305993   1.0
       C        2208   3513   2084   7805    19   28   18    65  0.282896  0.450096  0.267008   1.0
John   A        2383   2755   1554   6692    19   21   11    51  0.356097  0.411686  0.232218   1.0
       B        1779   1828   2790   6397    18   16   19    53  0.278099  0.285759  0.436142   1.0
       C        1706   1841   3311   6858    12   16   24    52  0.248761  0.268446  0.482794   1.0
Mark   A        1633   2173   1117   4923    20   18   14    52  0.331708  0.441398  0.226894   1.0
       B        2099   2533   2253   6885    18   23   21    62  0.304866  0.367901  0.327233   1.0
       C        2445   1925   2652   7022    20   14   22    56  0.348191  0.274138  0.377670   1.0
total          19212  19895  20139  59246   169  164  167   500  0.324275  0.335803  0.339922   1.0

Upvotes: 1

Related Questions