Javier
Javier

Reputation: 513

How to sort multiindex column month names?

I have this multiindex df:

                       YEARS_TMAX TMAX YEARS_TMAX TMAX  YEARS_TMAX
MONTH                       April April    August August  December .....
CODE   NAME                                                   
000130 RICA PLAYA          21.0  31.5      21.0   21.5      22.0
000132 PUERTO PIZARRO      12.0  33.8      12.0   32.4      11.0
000134 PAPAYAL             23.0  33.2      22.0   22.4      21.0
000135 EL SALTO            22.0  33.6      23.0   22.8      22.0
000136 CAÑAVERAL           16.0  32.7      15.0   33.1      11.0
                        ...   ...       ...    ...       ...
158317 SUSAPAYA            19.0  17.6      19.0   17.3      21.0
158321 PALCA               16.0  19.3      17.0   19.8      16.0
158323 TALABAYA            12.0  17.6      13.0   17.5      13.0
158326 CAPAZO              17.0  13.6      17.0   13.0      19.0
158328 PAUCARANI           14.0  13.3      13.0   11.9      15.0

I want to sort columns by month name (and TMAX columns first) like this:

                           TMAX YEARS_TMAX TMAX YEARS_TMAX  TMAX
MONTH                      January January February February March .....
CODE   NAME                                                   
000130 RICA PLAYA          22.0  31.5      23.0   27.5      23.0
000132 PUERTO PIZARRO      17.0  32.8      18.0   30.4      18.0
000134 PAPAYAL             25.0  32.2      26.0   28.4      25.0
000135 EL SALTO            26.0  31.6      26.0   26.8      26.0
000136 CAÑAVERAL           16.0  32.7      18.0   31.1      15.0
                        ...   ...       ...    ...       ...
158317 SUSAPAYA            19.0  17.6      19.0   17.3      21.0
158321 PALCA               16.0  19.3      17.0   19.8      16.0
158323 TALABAYA            12.0  17.6      13.0   17.5      13.0
158326 CAPAZO              17.0  13.6      17.0   13.0      19.0
158328 PAUCARANI           14.0  13.3      13.0   11.9      15.0

So i wrote this code: source: Sort "Date" in Multi-Index

dates = pd.to_datetime(df.columns.get_level_values(1), format='%B')
df.columns = [df.columns.get_level_values(0), dates]
df = df.sort_index(axis=1, level=1)

To sort columns by month but dates is not creating month names, dates is creating random dates. How can i solve this?

Thanks in advance.

Upvotes: 1

Views: 250

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Use a CategoricalDtype by creating an ordered dtype from calendar.month_name this will ensure the correct ordering by sort.

month_dtype = pd.CategoricalDtype(categories=list(month_name), ordered=True)
df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).astype(month_dtype)]
df = df.sort_index(axis=1, level=[1, 0])

Sample Data and Imports:

from calendar import month_name

import pandas as pd

df = pd.DataFrame(
    [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]],
    columns=pd.MultiIndex.from_product([
        ['YEARS_TMAX', 'TMAX'],
        ['March', 'January', 'February']
    ])
)

df before sort:

  YEARS_TMAX                   TMAX                 
       March January February March January February
0          1       2        3     4       5        6
1          7       8        9    10      11       12

df after sort:

     TMAX YEARS_TMAX     TMAX YEARS_TMAX  TMAX YEARS_TMAX
  January    January February   February March      March
0       5          2        6          3     4          1
1      11          8       12          9    10          7

The datetime approach would also work, but converting back to strings would be necessary with DatetimeIndex.strftime:

df.columns = [df.columns.get_level_values(0),
              pd.to_datetime(df.columns.get_level_values(1), format='%B')]
df = df.sort_index(axis=1, level=[1, 0])

# convert back to strings
df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).strftime('%B')]

df:

     TMAX YEARS_TMAX     TMAX YEARS_TMAX  TMAX YEARS_TMAX
  January    January February   February March      March
0       5          2        6          3     4          1
1      11          8       12          9    10          7

The drawback of this approach is level 1 is once again a string type which would need to be converted any time ordering needed to be changed as lexicographic ordering is not expected.

Upvotes: 1

Related Questions