Saturnix
Saturnix

Reputation: 10564

Python strptime with variable text

I have a list of dates as strings. It looks like this:

[
  "January 29-30 Meeting - 2013",
  "March 19-20 Meeting - 2013",
  "April/May 30-1 Meeting - 2013",
  "June 18-19 Meeting - 2013",
  "July 30-31 Meeting - 2013",
  "September 17-18 Meeting - 2013",
  "October 29-30 Meeting - 2013",
  "December 17-18 Meeting - 2013"
]

I need to parse these dates to datetime format.

datetime.strptime("January 29-30 Meeting - 2013", "%B %d-[something] - %Y")
datetime.strptime("January 29-30 Meeting - 2013", "%B [something]-%d [something] - %Y")

Is there any way I can tell strptime, in the format specifier, to ignore the text in [something] since it can be variable? Is there a format specifier for variable text?

Upvotes: 4

Views: 2465

Answers (2)

Grismar
Grismar

Reputation: 31329

There is no wildcard directive for strptime. You can see a list of the directives here https://docs.python.org/3/library/time.html#time.strftime

A sensible way to solve your problem would be to combine a regex with the strptime. I.e. filter out the text with the regex and put the remaining, restricted text into the strptime, or by just passing the matched groups directly into datetime.

import re
from datetime import datetime

ss = [
  "January 29-30 Meeting - 2013",
  "March 19-20 Meeting - 2013",
  "April/May 30-1 Meeting - 2013",
  "June 18-19 Meeting - 2013",
  "July 30-31 Meeting - 2013",
  "September 17-18 Meeting - 2013",
  "October 29-30 Meeting - 2013",
  "December 17-18 Meeting - 2013"
]

FORMAT = '%B %d %Y'

for s in ss:
    match = re.search(r"(\w+)\s(\d+)-(\d+)\s.*\s(\d{4})", s)
    if match:
        dt1 = datetime.strptime(f'{match.group(1)} {match.group(2)} {match.group(4)}', FORMAT)
        dt2 = datetime.strptime(f'{match.group(1)} {match.group(3)} {match.group(4)}', FORMAT)

        print (dt1, dt2)

Note that you also have the April/May 30-1 complication in there, I'm not addressing that, since you are not asking about that.

As a bonus though:

for s in ss:
    match = re.search(r"((\w+)/)?(\w+)\s(\d+)-(\d+)\s.*\s(\d{4})", s)
    if match:
        dt1 = datetime.strptime(
            f'{match.group(2) if match.group(2) else match.group(3)} {match.group(4)} {match.group(6)}', FORMAT)
        dt2 = datetime.strptime(
            f'{match.group(3)} {match.group(5)} {match.group(6)}', FORMAT)

        print (dt1, dt2)

Also, note the interesting, if a bit hacky solution offered by @blhsing below, involving _strptime.TimeRE. I would not recommend doing anything like that, but it is interesting to know you could actually change the behaviour of strptime itself that way.

Upvotes: 3

blhsing
blhsing

Reputation: 106553

You can override the _strptime.TimeRE object with an additional directive that lazily matches any sequence of characters:

from datetime import datetime
import _strptime
TimeRE = _strptime.TimeRE()
TimeRE.update({'x': '.*?'})
_strptime._TimeRE_cache = TimeRE
print(datetime.strptime("January 29-30 Meeting - 2013", "%B %d-%x - %Y"))

This outputs:

2013-01-29 00:00:00

Upvotes: 1

Related Questions