Reputation: 10564
I have a list of dates as strings. It looks like this:
[
"January 29-30 Meeting - 2013",
"March 19-20 Meeting - 2013",
"April/May 30-1 Meeting - 2013",
"June 18-19 Meeting - 2013",
"July 30-31 Meeting - 2013",
"September 17-18 Meeting - 2013",
"October 29-30 Meeting - 2013",
"December 17-18 Meeting - 2013"
]
I need to parse these dates to datetime
format.
datetime.strptime("January 29-30 Meeting - 2013", "%B %d-[something] - %Y")
datetime.strptime("January 29-30 Meeting - 2013", "%B [something]-%d [something] - %Y")
Is there any way I can tell strptime, in the format specifier, to ignore the text in [something]
since it can be variable? Is there a format specifier for variable text?
Upvotes: 4
Views: 2465
Reputation: 31329
There is no wildcard directive for strptime
. You can see a list of the directives here https://docs.python.org/3/library/time.html#time.strftime
A sensible way to solve your problem would be to combine a regex with the strptime
. I.e. filter out the text with the regex and put the remaining, restricted text into the strptime
, or by just passing the matched groups directly into datetime
.
import re
from datetime import datetime
ss = [
"January 29-30 Meeting - 2013",
"March 19-20 Meeting - 2013",
"April/May 30-1 Meeting - 2013",
"June 18-19 Meeting - 2013",
"July 30-31 Meeting - 2013",
"September 17-18 Meeting - 2013",
"October 29-30 Meeting - 2013",
"December 17-18 Meeting - 2013"
]
FORMAT = '%B %d %Y'
for s in ss:
match = re.search(r"(\w+)\s(\d+)-(\d+)\s.*\s(\d{4})", s)
if match:
dt1 = datetime.strptime(f'{match.group(1)} {match.group(2)} {match.group(4)}', FORMAT)
dt2 = datetime.strptime(f'{match.group(1)} {match.group(3)} {match.group(4)}', FORMAT)
print (dt1, dt2)
Note that you also have the April/May 30-1
complication in there, I'm not addressing that, since you are not asking about that.
As a bonus though:
for s in ss:
match = re.search(r"((\w+)/)?(\w+)\s(\d+)-(\d+)\s.*\s(\d{4})", s)
if match:
dt1 = datetime.strptime(
f'{match.group(2) if match.group(2) else match.group(3)} {match.group(4)} {match.group(6)}', FORMAT)
dt2 = datetime.strptime(
f'{match.group(3)} {match.group(5)} {match.group(6)}', FORMAT)
print (dt1, dt2)
Also, note the interesting, if a bit hacky solution offered by @blhsing below, involving _strptime.TimeRE
. I would not recommend doing anything like that, but it is interesting to know you could actually change the behaviour of strptime
itself that way.
Upvotes: 3
Reputation: 106553
You can override the _strptime.TimeRE
object with an additional directive that lazily matches any sequence of characters:
from datetime import datetime
import _strptime
TimeRE = _strptime.TimeRE()
TimeRE.update({'x': '.*?'})
_strptime._TimeRE_cache = TimeRE
print(datetime.strptime("January 29-30 Meeting - 2013", "%B %d-%x - %Y"))
This outputs:
2013-01-29 00:00:00
Upvotes: 1