Reputation: 187
I have a pandas dataframe that summarises sales by calendar month & outputs something like:
Month level_0 UNIQUE_ID 102018 112018 12018 122017 122018 22018 32018 42018 52018 62018 72018 82018 92018
0 SOLD_QUANTITY 01 3692.0 5182.0 3223.0 1292.0 2466.0 2396.0 2242.0 2217.0 3590.0 2593.0 1665.0 3371.0 3069.0
1 SOLD_QUANTITY 011 3.0 6.0 NaN NaN 7.0 5.0 2.0 1.0 5.0 NaN 1.0 1.0 3.0
2 SOLD_QUANTITY 02 370.0 130.0 NaN NaN 200.0 NaN NaN 269.0 202.0 NaN 201.0 125.0 360.0
3 SOLD_QUANTITY 03 2.0 6.0 NaN NaN 2.0 1.0 NaN 6.0 11.0 9.0 2.0 3.0 5.0
4 SOLD_QUANTITY 08 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 175.0 NaN NaN
I want to be able to programmatically re-arrange the column headers in ascending date order (eg starting 122017, 12018, 22018...). I need to do it in a way that is programmatic as every way the report runs, it will be a different list of months as it runs every month for last 365 days.
The index data type:
Index(['level_0', 'UNIQUE_ID', '102018', '112018', '12018', '122017', '122018',
'22018', '32018', '42018', '52018', '62018', '72018', '82018', '92018'],
dtype='object', name='Month')
Upvotes: 2
Views: 2045
Reputation: 862771
Use set_index
for only date
s columns, convert them to datetime
s and get order positions by argsort
, then change ordering with iloc
:
df = df.set_index(['level_0','UNIQUE_ID'])
df = df.iloc[:, pd.to_datetime(df.columns, format='%m%Y').argsort()].reset_index()
print (df)
level_0 UNIQUE_ID 122017 12018 22018 32018 42018 52018 \
0 SOLD_QUANTITY 1 1292.0 3223.0 2396.0 2242.0 2217.0 3590.0
1 SOLD_QUANTITY 11 NaN NaN 5.0 2.0 1.0 5.0
2 SOLD_QUANTITY 2 NaN NaN NaN NaN 269.0 202.0
3 SOLD_QUANTITY 3 NaN NaN 1.0 NaN 6.0 11.0
4 SOLD_QUANTITY 8 NaN NaN NaN NaN NaN NaN
62018 72018 82018 92018 102018 112018 122018
0 2593.0 1665.0 3371.0 3069.0 3692.0 5182.0 2466.0
1 NaN 1.0 1.0 3.0 3.0 6.0 7.0
2 NaN 201.0 125.0 360.0 370.0 130.0 200.0
3 9.0 2.0 3.0 5.0 2.0 6.0 2.0
4 NaN 175.0 NaN NaN NaN NaN NaN
Another idea is create month period index by DatetimeIndex.to_period
, so is possible use sort_index
:
df = df.set_index(['level_0','UNIQUE_ID'])
df.columns = pd.to_datetime(df.columns, format='%m%Y').to_period('m')
#alternative for convert to datetimes
#df.columns = pd.to_datetime(df.columns, format='%m%Y')
df = df.sort_index(axis=1).reset_index()
print (df)
level_0 UNIQUE_ID 2017-12 2018-01 2018-02 2018-03 2018-04 \
0 SOLD_QUANTITY 1 1292.0 3223.0 2396.0 2242.0 2217.0
1 SOLD_QUANTITY 11 NaN NaN 5.0 2.0 1.0
2 SOLD_QUANTITY 2 NaN NaN NaN NaN 269.0
3 SOLD_QUANTITY 3 NaN NaN 1.0 NaN 6.0
4 SOLD_QUANTITY 8 NaN NaN NaN NaN NaN
2018-05 2018-06 2018-07 2018-08 2018-09 2018-10 2018-11 2018-12
0 3590.0 2593.0 1665.0 3371.0 3069.0 3692.0 5182.0 2466.0
1 5.0 NaN 1.0 1.0 3.0 3.0 6.0 7.0
2 202.0 NaN 201.0 125.0 360.0 370.0 130.0 200.0
3 11.0 9.0 2.0 3.0 5.0 2.0 6.0 2.0
4 NaN NaN 175.0 NaN NaN NaN NaN NaN
Upvotes: 4