How to access multi-index values in Pandas for calculations

Question

I am trying to calculate the distribution of values in a pandas pivot table. I am having trouble calculating the values and understanding the proper way to access multi-index series/columns.

Here is a sample dataframe:

person = ['Jack', 'John', 'Mark']
randoms = ['A', 'B', 'C']

df = pd.DataFrame([], columns=['id', 'person','state', 'choice', 'num'])

for i in range(0,500):
    row = [i, random.choice(person), random.choice(randoms), random.choice(randoms), random.randrange(1, 250)]
    append_df = pd.DataFrame([row], columns=['id', 'person','state', 'choice', 'num'])
    df = df.append(append_df)

df.reset_index(drop=True, inplace=True)

pd.pivot_table(
    data = df,
    index = ['person','choice'],
    columns = ['state'],
    values='num',
    aggfunc = ['sum', 'count'],
    margins=True,
    margins_name='total',
    fill_value=0
)

The output of the pivot table looks something like this:

        sum sum sum sum count   count   count   count
state       A   B   C   total   A   B   C   total
person  choice                              
Jack    A   1519    1667    1460    4646    15  13  11  39
Jack    B   2078    1641    3200    6919    17  12  28  57
Jack    C   2166    1845    3575    7586    13  17  28  58
John    A   3241    2028    1880    7149    26  20  18  64
John    B   2467    2517    1200    6184    21  23  12  56
John    C   1585    2481    2791    6857    16  19  23  58
Mark    A   2320    2647    1858    6825    20  19  18  57
Mark    B   2807    2809    3116    8732    21  24  23  68
Mark    C   1953    1503    2558    6014    11  13  19  43
total       20136   19138   21638   60912   160 160 180 500

What I am trying to calculate is the % of the num value by column/index to get the distribution, like so [The calc would be sum A / sum total, sum B / sum total, etc]:

        sum sum sum sum count   count   count   count               
state       A   B   C   total   A   B   C   total   A_Pct   B_Pct   C_Pct   total_pct
person  choice                                              
Jack    A   1519    1667    1460    4646    15  13  11  39  0.33    0.36    0.31    1.00
Jack    B   2078    1641    3200    6919    17  12  28  57  0.30    0.24    0.46    1.00
Jack    C   2166    1845    3575    7586    13  17  28  58  0.29    0.24    0.47    1.00
John    A   3241    2028    1880    7149    26  20  18  64  0.45    0.28    0.26    1.00
John    B   2467    2517    1200    6184    21  23  12  56  0.40    0.41    0.19    1.00
John    C   1585    2481    2791    6857    16  19  23  58  0.23    0.36    0.41    1.00
Mark    A   2320    2647    1858    6825    20  19  18  57  0.34    0.39    0.27    1.00
Mark    B   2807    2809    3116    8732    21  24  23  68  0.32    0.32    0.36    1.00
Mark    C   1953    1503    2558    6014    11  13  19  43  0.32    0.25    0.43    1.00
total       20136   19138   21638   60912   160 160 180 500 0.33    0.31    0.36    1.00

What's the best way to access the multi-index to calculate this? Is there a standard way to call indexes by name rather than levels?

How to access multi-index values in Pandas for calculations

Answers (1)

Related Questions