harry04
harry04

Reputation: 962

Sorting datetime column when month is formatted as name

I have a column Date Time in my dataframe which has the date and time as a string.

Product       Date Time
ABC          Wed Jan 2 11:14:24 2019
ABC123       Wed Jan 2 11:14:24 2019
ABCXZY       Wed Jan 2 11:14:24 2019
BVF123       Mon Jan 14 10:24:20 2019
ABC456       Mon Jan 14 10:24:20 2019
ABC000       Mon Feb 4 10:44:08 2019
ABCXYZ       Mon Feb 4 10:44:08 2019
ABC678       Mon Feb 4 10:44:08 2019
ABCQYZ       Wed Feb 20 09:14:40 2019
ABC090       Wed Feb 20 09:14:40 2019  

I have converted this column to a datetime format using -

df['Date'] = pd.to_datetime(df['Date Time']).dt.strtime('%d-%b-%Y')

I want to now sort this dataframe on the basis of the Date column to plot the quantities for each date in ascending order of date, but when I use -

df.sort_values(by='Date', inplace=True, ascending=True)

it only gets sorted by the date and ignores the month name, i.e as

02-Jan-2019
04-Feb-2019
08-Mar-2019
13-Feb-2019
14-Jan-2019
20-Feb-2019
21-Mar-2019

instead of

02-Jan-2019
14-Jan-2019
04-Feb-2019
13-Feb-2019
20-Feb-2019
08-Mar-2019
21-Mar-2019

How can I get the desired sorting using pandas datetime or any other module?

Upvotes: 1

Views: 526

Answers (2)

jlesueur
jlesueur

Reputation: 326

pd.to_datetime(df['Date Time']).dt.strtime('%d-%b-%Y')

returns a series of string ("object type" to be precise) but not a series of datetime. That's why your sorting is wrong. Here is a code to do it :

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

### Dataframe    
data = {'Product' : ['ABC', 'ABC123', 'ABCXZY', 'BVF123', 'ABC456', 'ABC000', 'ABCXYZ', 'ABC678', 'ABCQYZ', 'ABC090'], 'Date Time' : ['Wed Jan 2 11:14:24 2019', 'Wed Jan 2 11:14:24 2019', 'Wed Jan 2 11:14:24 2019', 'Mon Jan 14 10:24:20 2019', 'Mon Jan 14 10:24:20 2019', 'Mon Feb 4 10:44:08 2019', 'Mon Feb 4 10:44:08 2019', 'Mon Feb 4 10:44:08 2019', 'Wed Feb 20 09:14:40 2019', 'Wed Feb 20 09:14:40 2019']}

df = pd.DataFrame(data)

### Conversion to datetime    
df['Date'] = pd.to_datetime(df.loc[:, 'Date Time'])

### Sorting
df.sort_values(by = 'Date', inplace = True)

### Plot    
ax = df.groupby('Date').count().Product.plot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b-%Y')) # Formatting x labels

Upvotes: 1

BENY
BENY

Reputation: 323226

We can do argsort

df=df.iloc[pd.to_datetime(df['Date Time'],format='%d-%b-%Y').argsort()]
Out[20]: 
     Date Time
3  14-Jan-2019
0  04-Feb-2019
2  13-Feb-2019
4  20-Feb-2019
1  08-Mar-2019
5  21-Mar-2019

Update

s=df.groupby(['Date Time']).size()
s.index=pd.to_datetime(s.index,format='%d-%b-%Y')
s.sort_index(inplace=True)
s.index=s.index.strftime('%d-%b-%Y')

Upvotes: 1

Related Questions