Reputation: 379
I have a dataframe in python in which one of the columns contains the following letters D W M Y which stand for day, week, month,year. I want to sort my dataframe based on this column in the order I have given it. However the only option I could find was sorting based on alphabetical order which is not what I need. Could you please advise me if this is possible.
The dataframe below is an example of what I am trying to sort. I need it sorted first based on id, then unit where the unit goes D,W,M,Y and then by time. I tried using the regular sort function but it is not working as it is sorting the unit based on alphabetical order.
dataframe
id time unit
1 3 M
2 5 Y
1 12 D
8 6 W
6 1 M
1 2 W
dataframe.sort_values(by=[id,unit,time])
Upvotes: 5
Views: 2819
Reputation: 1136
Alternatively, you could create a separate column in your DataFrame
:
df['unit_index'] = df['unit'].apply(list('DWMY').index)
df.sort_values(['id','unit_index','time'])
id time unit unit_index
2 1 12 D 0
5 1 2 W 1
0 1 3 M 2
1 2 5 Y 3
4 6 1 M 2
3 8 6 W 1
Upvotes: 0
Reputation: 88226
You can define the unit
column as a pd.Categorical
and then sort the columns.
By doing so, the ordered categoricals can be sorted according to the custom order of the categories and can have a min and max value. This can be done by setting a categories
list, which will specify the order to follow when sorting the dataframe. Then you can simply sort the dataframe as you where doing yourself with sort_values
:
df['unit'] = pd.Categorical(df.unit.tolist(), categories=list('DWMY'))
df.sort_values(['id','unit','time'])
id time unit
2 1 12 D
5 1 2 W
0 1 3 M
1 2 5 Y
4 6 1 M
3 8 6 W
Upvotes: 6