marshackVB
marshackVB

Reputation: 43

Pandas Pivot Table - reoganize order of multi-index

I have created a pivot table with a three-level multi-index (Group, Product, and State). The state level is automatically sorted alphabetically, but I need to change the ordering to a custom, non-alphabetical ordering. I may need to also re-order the Group and Product levels in similar fashion.

pivot = data.pivot_table(rows=['Group', 'Product', 'State'], 
                     values = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 
                               'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 
                               'NOV', 'DEC'], fill_value=0, margins=True aggfunc=sum) 

cols = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']

#I used this method to prevent the month names from being alphabetically sorted 
pivot = pivot[cols]

A portion of the pivot table looks like this...

                                 JUN     JUL    AUG
Group    Product     State   
Group A  Product A   AZ          0       0      0
                     CO          0       0      0
                     GA          0       0      0
                     IL          0       0      0
                     IN          0       0      0
                     KS          0       0      0
                     MN          0       0      0
                     MO          0       0      0

I need the ordering of the states to be as follows...

state_order = ['AZ','CO','ID','NV','OR','UT','WA','IA','KS','MN','MO','NE','ND','SD','GA','IL','IN','OH','WI']

I tried the reindex_axis() fuction, feeding in my list above and specifying level=2. However, the states were still sorted in alphabetical order.

Any insights into a fix would be much appreciated.

Upvotes: 1

Views: 3518

Answers (2)

Gecko
Gecko

Reputation: 1408

I got bitten by a similar problem, I will post a sample solution. You can use a dummy variable and groupby.

import pandas as pd 
import numpy as np 

index = pd.MultiIndex.from_tuples(zip(['a', 'a', 'a', 'b', 'b', 'b'],
                                      [0, 0, 0, 1, 1, 1],
                                      ['x', 'xx', 'xxx', 'x', 'xx', 'xxx']),
                                      names=['A', 'B', 'C'])
df = pd.DataFrame(np.random.rand(6, 3), index = index)
>>> df
                0         1         2
A B C                                
a 0 x    0.839870  0.763803  0.847632
    xx   0.619066  0.715492  0.467518
    xxx  0.917468  0.923521  0.278665
b 1 x    0.660889  0.209247  0.502107
    xx   0.069925  0.889308  0.836755
    xxx  0.967187  0.650482  0.138759



desired_order = ['xxx', 'xx', 'x']
df = df.reset_index(2)
mapping = { _ : desired_order.index(_) for _ in df['C'] }

df['Dummy'] = df['C'].map(lambda x: mapping[x]) #gives desired order
df = df.groupby(level=['A', 'B']).apply(lambda x: x.sort('Dummy'))
df.drop('Dummy', axis=1)


>>> df 
       C         0         1         2
A B                                   
a 0  xxx  0.273731  0.561262  0.970034
  0   xx  0.859063  0.459765  0.921475
  0    x  0.640554  0.045410  0.512320
b 1  xxx  0.678473  0.380712  0.252676
  1   xx  0.501426  0.577250  0.317702
  1    x  0.586227  0.927453  0.794912

There may be a better way using Categorical variables introduced in pandas 0.15, but I dont know a easier solution.

Upvotes: 0

Bob Haffner
Bob Haffner

Reputation: 8483

You can try changing the data type of State to category.

data["state"] = data["state"].astype("category")

then set the sort order

data["state"].cat.set_categories(['AZ','CO','ID','NV','OR','UT','WA','IA','KS',
                  'MN','MO','NE','ND','SD','GA','IL','IN','OH','WI'],inplace=True)

EDIT: fyi, the category dtype is relatively new. 0.15.0 i believe

Upvotes: 4

Related Questions