ThrawnCA
ThrawnCA

Reputation: 1081

How do I check whether a date input includes days in Python?

I'm working on a block of Python code that is meant to test inputs to determine whether they're numeric, timestamps, free text, etc. To detect dates, it uses the dateutil parser, then checks if the parse succeeded or an exception was thrown.

However, the dateutil parser is too forgiving and will turn all manner of values into date objects, such as time ranges, eg "12:00-16:00", being converted into timestamps on the current day, eg "2023-08-22T12:00-16:00" (which isn't even a valid timezone offset).

We'd like to only treat inputs as dates if they actually have a day-month-year component, not if they're just hours and minutes - but we still want to accept various date formats, yyyy-MM-ddThh:mm:ss or dd/MM/yyyy or whatever the input uses. Is there another library better suited to this, or some way to make dateutil stricter?

Upvotes: 1

Views: 247

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96257

Looking at the source code it doesn't look like there is any way to make the parser stricter.

But, it is open source. So we can understand what is going on. The main magic happens in the parser._parse method. Basically, a bunch of logic is used to resolve a date. It eventually always returns a tuple:

    except (IndexError, ValueError):
        return None, None

    if not info.validate(res):
        return None, None

    if fuzzy_with_tokens:
        skipped_tokens = self._recombine_skipped(l, skipped_idxs)
        return res, tuple(skipped_tokens)
    else:
        return res, None

Then in parser.parse we see that the default values are filled in to this result (and appropriate errors are raised if the result is None):

    if default is None:
        default = datetime.datetime.now().replace(hour=0, minute=0,
                                                  second=0, microsecond=0)

    res, skipped_tokens = self._parse(timestr, **kwargs)

    if res is None:
        raise ParserError("Unknown string format: %s", timestr)

    if len(res) == 0:
        raise ParserError("String does not contain a date: %s", timestr)

    try:
        ret = self._build_naive(res, default)
    except ValueError as e:
        six.raise_from(ParserError(str(e) + ": %s", timestr), e)

The filling in happens in _build_naive.

So, I am not suggesting this is the best route to go. But you can monkey-patch parser._parse to raise an error if we don't find all of a day, month and year attributes on the result. To make this slightly safer, we can wrap our patching in a context manager:

import contextlib
import dateutil.parser

@contextlib.contextmanager
def strict_parser():
    original_parse = dateutil.parser._parser.parser._parse
    def _parse_patch(self, *args, **kwargs):
        return_value = original_parse(self, *args, **kwargs)
        parsed_result = return_value[0]
        for attr in "year", "month", "day":
            if not getattr(parsed_result, attr, None):
                raise dateutil.parser.ParserError(
                    f"Require a full year, month, and day, did not find a {attr}"
                )
        return return_value
    dateutil.parser._parser.parser._parse = _parse_patch # do the monkey patch
    try:
        yield
    finally:
        dateutil.parser._parser.parser._parse = original_parse

So, here is how this would work:

In [1]: import contextlib
   ...: import dateutil.parser
   ...:
   ...: @contextlib.contextmanager
   ...: def strict_parser():
   ...:     original_parse = dateutil.parser._parser.parser._parse
   ...:     def _parse_patch(self, *args, **kwargs):
   ...:         return_value = original_parse(self, *args, **kwargs)
   ...:         parsed_result = return_value[0]
   ...:         for attr in "year", "month", "day":
   ...:             if not getattr(parsed_result, attr, None):
   ...:                 raise dateutil.parser.ParserError(
   ...:                     f"Require a full year, month, and day, did not find a {attr}"
   ...:                 )
   ...:         return return_value
   ...:     dateutil.parser._parser.parser._parse = _parse_patch # do the monkey patch
   ...:     try:
   ...:         yield
   ...:     finally:
   ...:         dateutil.parser._parser.parser._parse = original_parse
   ...:

In [2]: dateutil.__version__
Out[2]: '2.8.2'

In [3]: dateutil.parser.parse("12:00-16:00")
Out[3]: datetime.datetime(2023, 8, 22, 12, 0, tzinfo=tzoffset(None, -57600))

In [4]: with strict_parser():
   ...:     print(dateutil.parser.parse("12:00-16:00"))
   ...:
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
Cell In[4], line 2
      1 with strict_parser():
----> 2     print(dateutil.parser.parse("12:00-16:00"))

File ~/miniconda3/envs/py311/lib/python3.11/site-packages/dateutil/parser/_parser.py:1368, in parse(timestr, parserinfo, **kwargs)
   1366     return parser(parserinfo).parse(timestr, **kwargs)
   1367 else:
-> 1368     return DEFAULTPARSER.parse(timestr, **kwargs)

File ~/miniconda3/envs/py311/lib/python3.11/site-packages/dateutil/parser/_parser.py:640, in parser.parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    636 if default is None:
    637     default = datetime.datetime.now().replace(hour=0, minute=0,
    638                                               second=0, microsecond=0)
--> 640 res, skipped_tokens = self._parse(timestr, **kwargs)
    642 if res is None:
    643     raise ParserError("Unknown string format: %s", timestr)

Cell In[1], line 12, in strict_parser.<locals>._parse_patch(self, *args, **kwargs)
     10 for attr in "year", "month", "day":
     11     if not getattr(parsed_result, attr, None):
---> 12         raise dateutil.parser.ParserError(
     13             f"Require a full year, month, and day, did not find a {attr}"
     14         )
     15 return return_value

ParserError: Require a full year, month, and day, did not find a year

In [5]: with strict_parser():
   ...:     print(dateutil.parser.parse("10/08/1988"))
   ...:
1988-10-08 00:00:00

In [6]: dateutil.parser.parse("12:00-16:00")
Out[6]: datetime.datetime(2023, 8, 22, 12, 0, tzinfo=tzoffset(None, -57600))

Again, monkey-patching is always hack. But it is relatively easy to do. of course, you take on the responsibility now of maintaining this patch because it uses internal, implementation details.

Upvotes: 1

darkstar
darkstar

Reputation: 71

How about the python's re module. You can check string with regular expression to determine whether the string is valid date/datetime data and then you can use dateutil module.

for example, following snippets will determine whether the input string has the proper date pattern.

import re

def check_date(text)
   date_regex = re.compile(r"(/d{4}-/d{2}-/d{2}") # for "yyyy-mm-dd" pattern
   if re.search(data_regx, text):
      return True
   else:
      return False

Now, depending on the function's return you can use dateutil or datetime module to parse the string into date/datetime object.

Upvotes: 1

Related Questions