Reputation: 10677
I have a multi-indexed DataFrame
with names attached to the column levels. I'd like to be able to easily shuffle the columns around so that they match the order specified by the user. Since this is down the pipeline, I'm not able to use this recommended solution and order them properly at creation time.
I have a data table that looks (something) like
Experiment BASE IWWGCW IWWGDW
Lead Time 24 48 24 48 24 48
2010-11-27 12:00:00 0.997 0.991 0.998 0.990 0.998 0.990
2010-11-28 12:00:00 0.998 0.987 0.997 0.990 0.997 0.990
2010-11-29 12:00:00 0.997 0.992 0.997 0.992 0.997 0.992
2010-11-30 12:00:00 0.997 0.987 0.997 0.987 0.997 0.987
2010-12-01 12:00:00 0.996 0.986 0.996 0.986 0.996 0.986
I want to take in a list like ['IWWGCW', 'IWWGDW', 'BASE']
and reorder this to be:
Experiment IWWGCW IWWGDW BASE
Lead Time 24 48 24 48 24 48
2010-11-27 12:00:00 0.998 0.990 0.998 0.990 0.997 0.991
2010-11-28 12:00:00 0.997 0.990 0.997 0.990 0.998 0.987
2010-11-29 12:00:00 0.997 0.992 0.997 0.992 0.997 0.992
2010-11-30 12:00:00 0.997 0.987 0.997 0.987 0.997 0.987
2010-12-01 12:00:00 0.996 0.986 0.996 0.986 0.996 0.986
with the caveat that I don't always know at what level "Experiment" will be. I tried (where df
is the multi-indexed frame shown above)
df2 = df.reindex_axis(['IWWGCW', 'IWWGDW', 'BASE'], axis=1, level='Experiment')
but that didn't seem to work - it completed successfully, but the DataFrame that was returned had its column order unchanged.
My workaround is to have a function like:
def reorder_columns(frame, column_name, new_order):
"""Shuffle the specified columns of the frame to match new_order."""
index_level = frame.columns.names.index(column_name)
new_position = lambda t: new_order.index(t[index_level])
new_index = sorted(frame.columns, key=new_position)
new_frame = frame.reindex_axis(new_index, axis=1)
return new_frame
where reorder_columns(df, 'Experiment', ['IWWGCW', 'IWWGDW', 'BASE'])
does what I expect but it feels like I'm doing extra work. Is there an easier way to do this?
Upvotes: 41
Views: 41369
Reputation: 11
I've refined the answers here and written a function that should work out of the box on a pandas dataframe with a two layer multi-index. This should extendable to higher order column multi-indexes by changing the "by" argument in the 3rd line of the function.
def reorder_multindex_columns(df):
level_names = list(df.columns.names)
multi_tuples_df = pd.DataFrame.from_records(df.columns.values)
multi_tuples_df = multi_tuples_df.sort_values(by = [0,1])
multi_tuples = list(multi_tuples_df.to_records(index=False))
multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names = level_names)
return pd.DataFrame(df, columns=multi_cols)
Upvotes: 1
Reputation: 630
A solution from my comment above, using pandas 1.3.2:
df.reindex(columns=['IWWGCW', 'IWWGDW', 'BASE'], level='Experiment')
Upvotes: 18
Reputation: 301
This is the simplest one that worked for me:
for your selected level, create a list with columns in desired order;
reindex your columns and create a MultiIndex object from that list, keep in mind this returns a tuple;
use the MultiIndex object to reorder your DataFrame.
cols = ['IWWGCW', 'IWWGDW', 'BASE']
new_cols = df.columns.reindex(cols, level=0)
df.reindex(columns=new_cols[0]) #new_cols is a single item tuple
In one line:
df.reindex(columns=df.columns.reindex(['IWWGCW', 'IWWGDW', 'BASE'], level=0)[0])
voilá
Upvotes: 16
Reputation: 5314
The comment by andrew_reece should be the accepted answer. Simply use reindex().
Copy & pasting from the github issue:
>>> df
vals
first second third
mid 3rd 992 1.96
562 12.06
1st 73 -6.46
818 -15.75
658 5.90
btm 2nd 915 9.75
474 -1.47
905 -6.03
1st 717 8.01
909 -21.12
3rd 616 11.91
675 1.06
579 -4.01
top 1st 241 1.79
363 1.71
3rd 677 13.38
238 -16.77
407 17.19
2nd 728 -21.55
36 8.09
>>> df.reindex(['top', 'mid', 'btm'], level='first')
vals
first second third
top 1st 241 1.79
363 1.71
3rd 677 13.38
238 -16.77
407 17.19
2nd 728 -21.55
36 8.09
mid 3rd 992 1.96
562 12.06
1st 73 -6.46
818 -15.75
658 5.90
btm 2nd 915 9.75
474 -1.47
905 -6.03
1st 717 8.01
909 -21.12
3rd 616 11.91
675 1.06
579 -4.01
>>> df.reindex(['1st', '2nd', '3rd'], level='second')
vals
first second third
mid 1st 73 -6.46
818 -15.75
658 5.90
3rd 992 1.96
562 12.06
btm 1st 717 8.01
909 -21.12
2nd 915 9.75
474 -1.47
905 -6.03
3rd 616 11.91
675 1.06
579 -4.01
top 1st 241 1.79
363 1.71
2nd 728 -21.55
36 8.09
3rd 677 13.38
238 -16.77
407 17.19
>>> df.reindex(['top', 'btm'], level='first').reindex(['1st', '2nd'], level='second')
vals
first second third
top 1st 241 1.79
363 1.71
2nd 728 -21.55
36 8.09
btm 1st 717 8.01
909 -21.12
2nd 915 9.75
474 -1.47
905 -6.03
Upvotes: 2
Reputation: 9527
There is a very simple way: just create a new dataframe based on the original, with the correct order of multiindex columns:
multi_tuples = [('IWWGCW',24), ('IWWGCW',48), ('IWWGDW',24), ('IWWGDW',48)
, ('BASE',24), ('BASE',48)]
multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names=['Experiment', 'Lead Time'])
df_ordered_multi_cols = pd.DataFrame(df_ori, columns=multi_cols)
Upvotes: 32
Reputation: 105531
I don't know of anything off-hand. Created an enhancement ticket about it:
http://github.com/pydata/pandas/issues/1864
Upvotes: 9