Reputation: 73
I have a pandas dataframe, as shown below:
Timestamp_Start Event_ID Duration
555.54944 Fix_1 0.42248
559.07281 Fix_10 0.01996
559.14642 Fix_11 0
556.03192 Fix_2 0.16113
556.27985 Fix_3 0.24188
556.56097 Fix_4 0.04987
556.65497 Fix_5 0.10748
556.80859 Fix_6 0.75708
557.57983 Fix_7 0.11329
557.75348 Fix_8 0.65643
558.43665 Fix_9 0.27447
555.97925 Sac_1 0.04577
559.09961 Sac_10 0.0404
559.15302 Sac_11 0.00726
556.19916 Sac_2 0.07403
556.52747 Sac_3 0.02789
556.61865 Sac_4 0.02985
556.76849 Sac_5 0.0337
557.57294 Sac_6 0
557.69965 Sac_7 0.04687
558.41632 Sac_8 0.01325
558.71796 Sac_9 0.34552
I want to sort the 'Event_ID' column, so that Fix_1,Fix_2,Fix_3... and Sac_1,Sac_2,Sac_3... appear in order, like below:
Timestamp_StartEvent_ID Duration
555.54944 Fix_1 0.42248
556.03192 Fix_2 0.16113
556.27985 Fix_3 0.24188
556.56097 Fix_4 0.04987
556.65497 Fix_5 0.10748
556.80859 Fix_6 0.75708
557.57983 Fix_7 0.11329
557.75348 Fix_8 0.65643
558.43665 Fix_9 0.27447
559.07281 Fix_10 0.01996
559.14642 Fix_11 0
555.97925 Sac_1 0.04577
556.19916 Sac_2 0.07403
556.52747 Sac_3 0.02789
556.61865 Sac_4 0.02985
556.76849 Sac_5 0.0337
557.57294 Sac_6 0
557.69965 Sac_7 0.04687
558.41632 Sac_8 0.01325
558.71796 Sac_9 0.34552
559.09961 Sac_10 0.0404
559.15302 Sac_11 0.00726
Any ideas on how to do that? Thanks for your help.
Upvotes: 0
Views: 578
Reputation: 29742
One way using distutils.version
:
import numpy as np
from distutils.version import LooseVersion
f = np.vectorize(LooseVersion)
new_df = df.sort_values("Event_ID", key=f)
print(new_df)
Output:
Timestamp_Start Event_ID Duration
0 555.54944 Fix_1 0.42248
3 556.03192 Fix_2 0.16113
4 556.27985 Fix_3 0.24188
5 556.56097 Fix_4 0.04987
6 556.65497 Fix_5 0.10748
7 556.80859 Fix_6 0.75708
8 557.57983 Fix_7 0.11329
9 557.75348 Fix_8 0.65643
10 558.43665 Fix_9 0.27447
1 559.07281 Fix_10 0.01996
2 559.14642 Fix_11 0.00000
11 555.97925 Sac_1 0.04577
14 556.19916 Sac_2 0.07403
15 556.52747 Sac_3 0.02789
16 556.61865 Sac_4 0.02985
17 556.76849 Sac_5 0.03370
18 557.57294 Sac_6 0.00000
19 557.69965 Sac_7 0.04687
20 558.41632 Sac_8 0.01325
21 558.71796 Sac_9 0.34552
12 559.09961 Sac_10 0.04040
13 559.15302 Sac_11 0.00726
Upvotes: 4
Reputation: 11927
Normal sorting on the dataframe will not work, as you need the integer in the string to be treated as int value.
It can be done with extra space though.
You can make two columns like this,
df['event'] = df.Event_ID.str.rsplit("_").str[0]
df['idx'] = df.Event_ID.str.rsplit("_").str[-1].astype(int)
Now, sort on these two columns,
df.sort_values(['event', 'idx'])
Timestamp_Start Event_ID Duration idx event
0 555.54944 Fix_1 0.42248 1 Fix
3 556.03192 Fix_2 0.16113 2 Fix
4 556.27985 Fix_3 0.24188 3 Fix
5 556.56097 Fix_4 0.04987 4 Fix
6 556.65497 Fix_5 0.10748 5 Fix
7 556.80859 Fix_6 0.75708 6 Fix
8 557.57983 Fix_7 0.11329 7 Fix
9 557.75348 Fix_8 0.65643 8 Fix
10 558.43665 Fix_9 0.27447 9 Fix
1 559.07281 Fix_10 0.01996 10 Fix
2 559.14642 Fix_11 0.00000 11 Fix
11 555.97925 Sac_1 0.04577 1 Sac
14 556.19916 Sac_2 0.07403 2 Sac
15 556.52747 Sac_3 0.02789 3 Sac
16 556.61865 Sac_4 0.02985 4 Sac
17 556.76849 Sac_5 0.03370 5 Sac
18 557.57294 Sac_6 0.00000 6 Sac
19 557.69965 Sac_7 0.04687 7 Sac
20 558.41632 Sac_8 0.01325 8 Sac
21 558.71796 Sac_9 0.34552 9 Sac
12 559.09961 Sac_10 0.04040 10 Sac
13 559.15302 Sac_11 0.00726 11 Sac
You can reset_index, drop extra columns as needed
Upvotes: 1