hk_03
hk_03

Reputation: 311

Convert string to datetime pandas

I have a DataFrame that contains strings which should be converted to datetime in order to sort the DataFrame. The strings are received from Syslogs.

enter image description here

The strings look like as the ones on the picture and below:

date
Mar 16 03:40:24.411
Mar 16 03:40:25.415
Mar 16 03:40:28.532
Mar 16 03:40:30.539
Mar 14 03:20:30.337
Mar 14 03:20:31.340
Mar 14 03:20:37.415

I tried to convert it with pandas.to_datetime(), but I received the following error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-03-16 03:40:24

I may need the nanoseconds as well.

Upvotes: 0

Views: 4264

Answers (2)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15558

The best way is using pandas.to_datetime as mentioned above. If you are not familiar with date string formatting, you can getaway using date parser libraries. Example dateutil library:

# python -m pip install —user dateutil
from dateutil import parser
import pandas as pd

df = pd.DataFrame({'dates': ['Mar 16 03:40:24.411',' Mar 16 03:40:25.415','Mar 16 03:40:28.532']})

# parse it
df['dates'] = df['dates'].apply(parser.parse)
print(df)

dateutil parser will add current year to your dates.

vectoring

# using numpy.vectorize
import numpy as np
df['dates'] =  np.vectorize(parser.parse)(df['dates'])

Note:

This is not optional for large datasets and should be used only when pd.to_datetime is not able to parse date.

Upvotes: 1

jezrael
jezrael

Reputation: 862406

Is necessary specify format of string with this reference. There is no year, so output year is default:

df['date'] = pd.to_datetime(df['date'], format='%b %d %H:%M:%S.%f')
print (df)
                     date
0 1900-03-16 03:40:24.411
1 1900-03-16 03:40:25.415
2 1900-03-16 03:40:28.532
3 1900-03-16 03:40:30.539
4 1900-03-14 03:20:30.337
5 1900-03-14 03:20:31.340
6 1900-03-14 03:20:37.415

You can add some year to column and then parse it like:

df['date'] = pd.to_datetime('2020 ' + df['date'], format='%Y %b %d %H:%M:%S.%f')
print (df)
                     date
0 2020-03-16 03:40:24.411
1 2020-03-16 03:40:25.415
2 2020-03-16 03:40:28.532
3 2020-03-16 03:40:30.539
4 2020-03-14 03:20:30.337
5 2020-03-14 03:20:31.340
6 2020-03-14 03:20:37.415

Upvotes: 2

Related Questions