Niklas9
Niklas9

Reputation: 9396

How to require a timestamp to be zero-padded during validation in Python?

I'm trying to validate a string that's supposed to contain a timestamp in the format of ISO 8601 (commonly used in JSON).

Python's strptime seems to be very forgiving when it comes to validating zero-padding, see code example below (note that the hour is missing a leading zero):

>>> import datetime
>>> s = '1985-08-23T3:00:00.000'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
datetime.datetime(1985, 8, 23, 3, 0)

It gracefully accepts a string that's not zero-padded for the hour for example, and doesn't throw a ValueError exception as I would expect.

Is there any way to enforce strptime to validate that it's zero-padded? Or is there any other built-in function in the standard libs of Python that does?

I would like to avoid writing my own regexp for this.

Upvotes: 15

Views: 4393

Answers (4)

Arminius
Arminius

Reputation: 1169

There is already an answer that parsing ISO8601 or RFC3339 date/time with Python strptime() is impossible: How to parse an ISO 8601-formatted date? So, to answer you question, no there is no way in the standard Python library to reliable parse such a date. Regarding the regex suggestions, a date string like

2020-14-32T45:33:44.123

would result in a valid date. There are lots of Python modules (if you search for "iso8601" on https://pypi.python.org), but building a complete ISO8601 Validator would require things like leap seconds, the list of possible time zone offset values and many more.

Upvotes: 5

user3657941
user3657941

Reputation:

You said you want to avoid a regex, but this is actually the type of problem where a regex is appropriate. As you discovered, strptime is very flexible about the input it will accept. However, the regex for this problem is relatively easy to compose:

import re

date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}')
s_list = [
    '1985-08-23T3:00:00.000',
    '1985-08-23T03:00:00.000'
]
for s in s_list:
    if date_pattern.match(s):
        print "%s is valid" % s
    else:
        print "%s is invalid" % s

Output

1985-08-23T3:00:00.000 is invalid
1985-08-23T03:00:00.000 is valid

Try it on repl.it

Upvotes: 1

Uvar
Uvar

Reputation: 3462

The only thing I can think of outside of messing with Python internals is to test for the validity of the format by knowing what you are looking for.

So, if I garner it right, the format is '%Y-%m-%dT%H:%M:%S.%f' and should be zero padded. Then, you know the exact length of the string you are looking for and reproduce the intended result..

import datetime
s = '1985-08-23T3:00:00.000'

stripped = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f') 
try:
    assert len(s) == 23
except AssertionError:
    raise ValueError("time data '{}' does not match format '%Y-%m-%dT%H:%M:%S.%f".format(s))
else:
    print(stripped) #just for good measure

>>ValueError: time data '1985-08-23T3:00:00.000' does not match format '%Y-%m-%dT%H:%M:%S.%f

Upvotes: 0

Eugene Pakhomov
Eugene Pakhomov

Reputation: 10707

To enforce strptime to validate leading zeros for you you'll have to add your own literals to Python's _strptime._TimeRE_cache. The solution is very hacky, most likely not very portable, and requires writing a RegEx - although only for the hour part of a timestamp.

Another solution to the problem would be to write your own function that uses strptime and also converts the parsed date back to a string and compares the two strings. This solution is portable, but it lacks for the clear error messages - you won't be able to distinguish between missing leading zeros in hours, minutes, seconds.

Upvotes: 1

Related Questions