Reputation: 513
I have this multiindex df
:
YEARS_TMAX TMAX YEARS_TMAX TMAX YEARS_TMAX
MONTH April April August August December .....
CODE NAME
000130 RICA PLAYA 21.0 31.5 21.0 21.5 22.0
000132 PUERTO PIZARRO 12.0 33.8 12.0 32.4 11.0
000134 PAPAYAL 23.0 33.2 22.0 22.4 21.0
000135 EL SALTO 22.0 33.6 23.0 22.8 22.0
000136 CAÑAVERAL 16.0 32.7 15.0 33.1 11.0
... ... ... ... ...
158317 SUSAPAYA 19.0 17.6 19.0 17.3 21.0
158321 PALCA 16.0 19.3 17.0 19.8 16.0
158323 TALABAYA 12.0 17.6 13.0 17.5 13.0
158326 CAPAZO 17.0 13.6 17.0 13.0 19.0
158328 PAUCARANI 14.0 13.3 13.0 11.9 15.0
I want to sort columns by month name (and TMAX columns first) like this:
TMAX YEARS_TMAX TMAX YEARS_TMAX TMAX
MONTH January January February February March .....
CODE NAME
000130 RICA PLAYA 22.0 31.5 23.0 27.5 23.0
000132 PUERTO PIZARRO 17.0 32.8 18.0 30.4 18.0
000134 PAPAYAL 25.0 32.2 26.0 28.4 25.0
000135 EL SALTO 26.0 31.6 26.0 26.8 26.0
000136 CAÑAVERAL 16.0 32.7 18.0 31.1 15.0
... ... ... ... ...
158317 SUSAPAYA 19.0 17.6 19.0 17.3 21.0
158321 PALCA 16.0 19.3 17.0 19.8 16.0
158323 TALABAYA 12.0 17.6 13.0 17.5 13.0
158326 CAPAZO 17.0 13.6 17.0 13.0 19.0
158328 PAUCARANI 14.0 13.3 13.0 11.9 15.0
So i wrote this code: source: Sort "Date" in Multi-Index
dates = pd.to_datetime(df.columns.get_level_values(1), format='%B')
df.columns = [df.columns.get_level_values(0), dates]
df = df.sort_index(axis=1, level=1)
To sort columns by month but dates
is not creating month names, dates
is creating random dates.
How can i solve this?
Thanks in advance.
Upvotes: 1
Views: 250
Reputation: 35626
Use a CategoricalDtype by creating an ordered dtype from calendar.month_name this will ensure the correct ordering by sort.
month_dtype = pd.CategoricalDtype(categories=list(month_name), ordered=True)
df.columns = [df.columns.get_level_values(0),
df.columns.get_level_values(1).astype(month_dtype)]
df = df.sort_index(axis=1, level=[1, 0])
Sample Data and Imports:
from calendar import month_name
import pandas as pd
df = pd.DataFrame(
[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]],
columns=pd.MultiIndex.from_product([
['YEARS_TMAX', 'TMAX'],
['March', 'January', 'February']
])
)
df
before sort:
YEARS_TMAX TMAX
March January February March January February
0 1 2 3 4 5 6
1 7 8 9 10 11 12
df
after sort:
TMAX YEARS_TMAX TMAX YEARS_TMAX TMAX YEARS_TMAX
January January February February March March
0 5 2 6 3 4 1
1 11 8 12 9 10 7
The datetime approach would also work, but converting back to strings would be necessary with DatetimeIndex.strftime:
df.columns = [df.columns.get_level_values(0),
pd.to_datetime(df.columns.get_level_values(1), format='%B')]
df = df.sort_index(axis=1, level=[1, 0])
# convert back to strings
df.columns = [df.columns.get_level_values(0),
df.columns.get_level_values(1).strftime('%B')]
df
:
TMAX YEARS_TMAX TMAX YEARS_TMAX TMAX YEARS_TMAX
January January February February March March
0 5 2 6 3 4 1
1 11 8 12 9 10 7
The drawback of this approach is level 1 is once again a string type which would need to be converted any time ordering needed to be changed as lexicographic ordering is not expected.
Upvotes: 1