Reputation: 43
I have created a pivot table with a three-level multi-index (Group, Product, and State). The state level is automatically sorted alphabetically, but I need to change the ordering to a custom, non-alphabetical ordering. I may need to also re-order the Group and Product levels in similar fashion.
pivot = data.pivot_table(rows=['Group', 'Product', 'State'],
values = ['JAN', 'FEB', 'MAR', 'APR', 'MAY',
'JUN', 'JUL', 'AUG', 'SEP', 'OCT',
'NOV', 'DEC'], fill_value=0, margins=True aggfunc=sum)
cols = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
#I used this method to prevent the month names from being alphabetically sorted
pivot = pivot[cols]
A portion of the pivot table looks like this...
JUN JUL AUG
Group Product State
Group A Product A AZ 0 0 0
CO 0 0 0
GA 0 0 0
IL 0 0 0
IN 0 0 0
KS 0 0 0
MN 0 0 0
MO 0 0 0
I need the ordering of the states to be as follows...
state_order = ['AZ','CO','ID','NV','OR','UT','WA','IA','KS','MN','MO','NE','ND','SD','GA','IL','IN','OH','WI']
I tried the reindex_axis() fuction, feeding in my list above and specifying level=2. However, the states were still sorted in alphabetical order.
Any insights into a fix would be much appreciated.
Upvotes: 1
Views: 3518
Reputation: 1408
I got bitten by a similar problem, I will post a sample solution. You can use a dummy variable and groupby.
import pandas as pd
import numpy as np
index = pd.MultiIndex.from_tuples(zip(['a', 'a', 'a', 'b', 'b', 'b'],
[0, 0, 0, 1, 1, 1],
['x', 'xx', 'xxx', 'x', 'xx', 'xxx']),
names=['A', 'B', 'C'])
df = pd.DataFrame(np.random.rand(6, 3), index = index)
>>> df
0 1 2
A B C
a 0 x 0.839870 0.763803 0.847632
xx 0.619066 0.715492 0.467518
xxx 0.917468 0.923521 0.278665
b 1 x 0.660889 0.209247 0.502107
xx 0.069925 0.889308 0.836755
xxx 0.967187 0.650482 0.138759
desired_order = ['xxx', 'xx', 'x']
df = df.reset_index(2)
mapping = { _ : desired_order.index(_) for _ in df['C'] }
df['Dummy'] = df['C'].map(lambda x: mapping[x]) #gives desired order
df = df.groupby(level=['A', 'B']).apply(lambda x: x.sort('Dummy'))
df.drop('Dummy', axis=1)
>>> df
C 0 1 2
A B
a 0 xxx 0.273731 0.561262 0.970034
0 xx 0.859063 0.459765 0.921475
0 x 0.640554 0.045410 0.512320
b 1 xxx 0.678473 0.380712 0.252676
1 xx 0.501426 0.577250 0.317702
1 x 0.586227 0.927453 0.794912
There may be a better way using Categorical variables introduced in pandas 0.15, but I dont know a easier solution.
Upvotes: 0
Reputation: 8483
You can try changing the data type of State to category.
data["state"] = data["state"].astype("category")
then set the sort order
data["state"].cat.set_categories(['AZ','CO','ID','NV','OR','UT','WA','IA','KS',
'MN','MO','NE','ND','SD','GA','IL','IN','OH','WI'],inplace=True)
EDIT: fyi, the category dtype is relatively new. 0.15.0 i believe
Upvotes: 4