ScalaBoy
ScalaBoy

Reputation: 3392

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

I tried:

df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))

but got this error:

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

How to solve it correctly?

Upvotes: 3

Views: 2275

Answers (3)

jpp
jpp

Reputation: 164843

Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.

Here's one way to approach the problem:

# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'

# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')

# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)

Here's some data to test:

dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})

Result after transformations described above:

print(df['datetime'])

0   2006-11-11 00:00:00
1   2006-11-11 00:00:00
2   2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]

Upvotes: 1

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477794

The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.

I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.

Like:

from datetime import timedelta
import pandas as pd

df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)

The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.

Upvotes: 3

Matthieu Brucher
Matthieu Brucher

Reputation: 22043

As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.

Upvotes: 0

Related Questions