lufthansa747
lufthansa747

Reputation: 2061

Python strptime parse timezone format "GMT+-H"

I'm using python 3.7 and trying to figure out the correct format to get this code to work

dt = datetime.strptime("4 January 2022, 22:03 GMT-5", "%-d %b %Y, %H:%M %Zz")

The above line always fails. Is there something I can do to get it to parse? I am assuming its failing on the "GMT-5 part"

Edit: Adding context, the input string is scraped from a website so I need to find a way to turn it into a python datetime object so by code can understand when the event took place. Im not really sure how I could "In Code" change the input to match the required format of strptime

Upvotes: 1

Views: 857

Answers (3)

LucBerge
LucBerge

Reputation: 51

I do not really like the current solution because it is not generic enough. Spliting on GMT will not work if the timezone is UTC.

Instead I prefer to fix the timezone from +/-h to +/-hhmm.

Here is what I use:

from datetime import datetime
import re


def parse_date(string_date, _format):
    fixed_string_date = re.sub(r'[+-]\d{1,2}', '', string_date)
    offset = int(string_date.replace(fixed_string_date, '') or 0)
    fixed_string_date += f"{offset:+03d}00" if offset else ""
    return datetime.strptime(fixed_string_date, _format)


print(parse_date("Mon, Feb 24, 2025, 09:05:00 AM UTC+10", "%a, %b %d, %Y, %I:%M:%S %p %Z%z"))
print(parse_date("Sat, Jan 01, 2023, 05:43:26 PM UTC", "%a, %b %d, %Y, %I:%M:%S %p %Z"))
print(parse_date("4 January 2022, 22:03 GMT-5", "%d %B %Y, %H:%M %Z%z"))

The ideal solution would be that %z matches 1 digits offsets.


Otherwise, dateparser is also working great.

import dateparser # pip install dateparser

print(dateparser.parse("Mon, Feb 24, 2025, 09:05:00 AM UTC+10"))
print(dateparser.parse("Sat, Jan 01, 2023, 05:43:26 PM UTC"))
print(dateparser.parse("4 January 2022, 22:03 GMT-5"))

Upvotes: 0

FObersteiner
FObersteiner

Reputation: 25634

%z parsing directive won't parse an hour-only UTC offset (docs: requires ±HHMM[SS[.ffffff]] form). But you can derive a timezone object from a timedelta and set it like

from datetime import datetime, timedelta, timezone

s = "4 January 2022, 22:03 GMT-5"

parts = s.split('GMT')

dt = (datetime.strptime(parts[0].strip(), "%d %B %Y, %H:%M") # parse to datetime w/o offset
          .replace(tzinfo=timezone(timedelta(hours=int(parts[1]))))) # add UTC offset

print(dt)
# 2022-01-04 22:03:00-05:00

Upvotes: 1

Jan Wilamowski
Jan Wilamowski

Reputation: 3608

You're using the wrong format for the month and invalid text for the UTC offset (it has to be four digits, as described in the documentation):

>>> datetime.datetime.strptime("4 January 2022, 22:03 GMT-0500", "%d %B %Y, %H:%M %Z%z")
datetime.datetime(2022, 1, 4, 22, 3, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400), 'GMT'))

Upvotes: 0

Related Questions