Reputation: 2061
I'm using python 3.7 and trying to figure out the correct format to get this code to work
dt = datetime.strptime("4 January 2022, 22:03 GMT-5", "%-d %b %Y, %H:%M %Zz")
The above line always fails. Is there something I can do to get it to parse? I am assuming its failing on the "GMT-5 part"
Edit: Adding context, the input string is scraped from a website so I need to find a way to turn it into a python datetime object so by code can understand when the event took place. Im not really sure how I could "In Code" change the input to match the required format of strptime
Upvotes: 1
Views: 857
Reputation: 51
I do not really like the current solution because it is not generic enough. Spliting on GMT
will not work if the timezone is UTC
.
Instead I prefer to fix the timezone from +/-h
to +/-hhmm
.
Here is what I use:
from datetime import datetime
import re
def parse_date(string_date, _format):
fixed_string_date = re.sub(r'[+-]\d{1,2}', '', string_date)
offset = int(string_date.replace(fixed_string_date, '') or 0)
fixed_string_date += f"{offset:+03d}00" if offset else ""
return datetime.strptime(fixed_string_date, _format)
print(parse_date("Mon, Feb 24, 2025, 09:05:00 AM UTC+10", "%a, %b %d, %Y, %I:%M:%S %p %Z%z"))
print(parse_date("Sat, Jan 01, 2023, 05:43:26 PM UTC", "%a, %b %d, %Y, %I:%M:%S %p %Z"))
print(parse_date("4 January 2022, 22:03 GMT-5", "%d %B %Y, %H:%M %Z%z"))
The ideal solution would be that %z
matches 1 digits offsets.
Otherwise, dateparser
is also working great.
import dateparser # pip install dateparser
print(dateparser.parse("Mon, Feb 24, 2025, 09:05:00 AM UTC+10"))
print(dateparser.parse("Sat, Jan 01, 2023, 05:43:26 PM UTC"))
print(dateparser.parse("4 January 2022, 22:03 GMT-5"))
Upvotes: 0
Reputation: 25634
%z
parsing directive won't parse an hour-only UTC offset (docs: requires ±HHMM[SS[.ffffff]] form). But you can derive a timezone object from a timedelta and set it like
from datetime import datetime, timedelta, timezone
s = "4 January 2022, 22:03 GMT-5"
parts = s.split('GMT')
dt = (datetime.strptime(parts[0].strip(), "%d %B %Y, %H:%M") # parse to datetime w/o offset
.replace(tzinfo=timezone(timedelta(hours=int(parts[1]))))) # add UTC offset
print(dt)
# 2022-01-04 22:03:00-05:00
Upvotes: 1
Reputation: 3608
You're using the wrong format for the month and invalid text for the UTC offset (it has to be four digits, as described in the documentation):
>>> datetime.datetime.strptime("4 January 2022, 22:03 GMT-0500", "%d %B %Y, %H:%M %Z%z")
datetime.datetime(2022, 1, 4, 22, 3, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400), 'GMT'))
Upvotes: 0