pandas pivot table - ordered categories causing unexpected margins

Question

Using python 3.7 and pandas 0.23.4. I'm trying to make pivot tables with ordered categorical data. If I include margins, the subtotals don't seem to be in the correct order.

import pandas as pd
m='male'
f='female'

data = {'num': [0,1,2,3,4,5,6,7,8,9],
        'age': [1,2,2,3,3,3,3,1,2,3],
        'sex': [f,f,f,f,f,f,f,m,m,m]}
df = pd.DataFrame(data=data)

df['age1'] = pd.Categorical(df['age'],categories=[3,2,1],ordered=True)
df['sex1'] = pd.Categorical(df['sex'],categories=[m,f],ordered=True)
pd.pivot_table(df,values='num',index='age1',columns='sex1',aggfunc='count',margins=True)

Output (incorrect margins order, the 'All' sums are not in the right rows or columns):

sex1  male  female  All
age1                   
3        1       4    2
2        1       2    3
1        1       1    5
All      7       3   10

Expected output (correct margins order):

sex1  male  female  All
age1                   
3        1       4    5
2        1       2    3
1        1       1    2
All      3       7   10

In this example it might be better to create the categories with ordered=False. However much of my data is automatically ordered (using pd.cut) so I would like to know if this is intended behavior, and if so, is there a way to remove the ordering on a category that was created with an order?

Edit- here's an example using pd.cut. I changed the 'age' column values to appear in reverse of the cut order.

import pandas as pd
m='male'
f='female'
data = {'num': [0,1,2,3,4,5,6,7,8,9],
        'age': [3,3,3,3,2,2,1,1,2,3],
        'sex': [f,f,f,f,f,f,f,m,m,m]}
df = pd.DataFrame(data=data)
df['cut'] = pd.cut(df['age'],[1,2,3,4],labels=['<2','2','>2'],right=False)
pd.pivot_table(df,values='num',index='cut',columns='sex',aggfunc='count',margins=True)

Output, again with incorrect row margins (corresponding to the ordered category from pd.cut).

sex  female  male  All
cut                   
<2        1     1    5
2         2     1    3
>2        4     1    2
All       7     3   10

Expected output would be the correct row margin order.

sex  female  male  All
cut                   
<2        1     1    2
2         2     1    3
>2        4     1    5
All       7     3   10

pandas pivot table - ordered categories causing unexpected margins

Answers (1)

Related Questions