Reputation: 9396
I'm trying to validate a string that's supposed to contain a timestamp in the format of ISO 8601 (commonly used in JSON).
Python's strptime
seems to be very forgiving when it comes to validating zero-padding, see code example below (note that the hour is missing a leading zero):
>>> import datetime
>>> s = '1985-08-23T3:00:00.000'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
datetime.datetime(1985, 8, 23, 3, 0)
It gracefully accepts a string that's not zero-padded for the hour for example, and doesn't throw a ValueError
exception as I would expect.
Is there any way to enforce strptime to validate that it's zero-padded? Or is there any other built-in function in the standard libs of Python that does?
I would like to avoid writing my own regexp
for this.
Upvotes: 15
Views: 4393
Reputation: 1169
There is already an answer that parsing ISO8601 or RFC3339 date/time with Python strptime() is impossible: How to parse an ISO 8601-formatted date? So, to answer you question, no there is no way in the standard Python library to reliable parse such a date. Regarding the regex suggestions, a date string like
2020-14-32T45:33:44.123
would result in a valid date. There are lots of Python modules (if you search for "iso8601" on https://pypi.python.org), but building a complete ISO8601 Validator would require things like leap seconds, the list of possible time zone offset values and many more.
Upvotes: 5
Reputation:
You said you want to avoid a regex, but this is actually the type of problem where a regex is appropriate. As you discovered, strptime
is very flexible about the input it will accept. However, the regex for this problem is relatively easy to compose:
import re
date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}')
s_list = [
'1985-08-23T3:00:00.000',
'1985-08-23T03:00:00.000'
]
for s in s_list:
if date_pattern.match(s):
print "%s is valid" % s
else:
print "%s is invalid" % s
Output
1985-08-23T3:00:00.000 is invalid
1985-08-23T03:00:00.000 is valid
Try it on repl.it
Upvotes: 1
Reputation: 3462
The only thing I can think of outside of messing with Python internals is to test for the validity of the format by knowing what you are looking for.
So, if I garner it right, the format is '%Y-%m-%dT%H:%M:%S.%f'
and should be zero padded.
Then, you know the exact length of the string you are looking for and reproduce the intended result..
import datetime
s = '1985-08-23T3:00:00.000'
stripped = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
try:
assert len(s) == 23
except AssertionError:
raise ValueError("time data '{}' does not match format '%Y-%m-%dT%H:%M:%S.%f".format(s))
else:
print(stripped) #just for good measure
>>ValueError: time data '1985-08-23T3:00:00.000' does not match format '%Y-%m-%dT%H:%M:%S.%f
Upvotes: 0
Reputation: 10707
To enforce strptime
to validate leading zeros for you you'll have to add your own literals to Python's _strptime._TimeRE_cache
. The solution is very hacky, most likely not very portable, and requires writing a RegEx - although only for the hour part of a timestamp.
Another solution to the problem would be to write your own function that uses strptime
and also converts the parsed date back to a string and compares the two strings. This solution is portable, but it lacks for the clear error messages - you won't be able to distinguish between missing leading zeros in hours, minutes, seconds.
Upvotes: 1