How can I sort time in dataframe in terms of hour,minute,seconds and milliseconds?

I have a problem about sorting time including hour,minute,seconds and milliseconds as ascending.

In my dafaframe, time column has defined as shown below. ( df['Time'].unique() )

array(['2:13:23.600', '3:13:18.700', '2:02:53.700', ..., '1:33:55.653',
       '1:33:14.678', '1:34:05.715'], dtype=object)

Time is also included this kind of variables such as 42:53.700 , 5:30.622 , 10.111

How can I sort this column?

Here is my screenshot

enter image description here

Upvotes: 1

Views: 1175

Answers (1)

jezrael
jezrael

Reputation: 863166

Convert values to timedeltas by to_timedelta and then sorting by DataFrame.sort_values:

a = np.array(['2:13:23.600', '3:13:18.700', '2:02:53.700', '1:33:55.653',
              '1:33:14.678', '1:34:05.715'])
df = pd.DataFrame({'Time':a})

df['Time'] = pd.to_timedelta(df['Time'])
df = df.sort_values('Time')
print (df)
             Time
4 01:33:14.678000
3 01:33:55.653000
5 01:34:05.715000
2 02:02:53.700000
0 02:13:23.600000
1 03:13:18.700000

Another idea with Series.argsort for array of position and for change order pass to DataFrame.iloc, but because multiple format create multiple Series for handle each of them, join together by Series.fillna for replace non matched values (missing values):

a = np.array(['2:13:23.600', '3:13:18.700', '2:02:53.700', '1:33:55.653',
              '1:33:14.678', '1:34:05.715', '42:53.700' , '5:30.622' , '10.111'])

df = pd.DataFrame({'Time':a})

d1 = pd.to_datetime(df['Time'], format='%H:%M:%S.%f', errors='coerce')
d2 = pd.to_datetime(df['Time'], format='%M:%S.%f', errors='coerce')
d3 = pd.to_datetime(df['Time'], format='%S.%f', errors='coerce')

d = d1.fillna(d2).fillna(d3)
print (d)
0   1900-01-01 02:13:23.600
1   1900-01-01 03:13:18.700
2   1900-01-01 02:02:53.700
3   1900-01-01 01:33:55.653
4   1900-01-01 01:33:14.678
5   1900-01-01 01:34:05.715
6   1900-01-01 00:42:53.700
7   1900-01-01 00:05:30.622
8   1900-01-01 00:00:10.111
Name: Time, dtype: datetime64[ns]

Check if all values are converted, so here is nonecessary empty Series:

print (d[d.isna()])

Series([], Name: Time, dtype: datetime64[ns])

And last change order:

df = df.iloc[d.argsort()]
print (df)
          Time
8       10.111
7     5:30.622
6    42:53.700
4  1:33:14.678
3  1:33:55.653
5  1:34:05.715
2  2:02:53.700
0  2:13:23.600
1  3:13:18.700

Upvotes: 1

Related Questions