Zach Morris
Zach Morris

Reputation: 211

Converting timezones with Pandas datetime object with an hours offset

I have been pulling my hair out trying to convert a provided time from a MYSQL database in Pandas that has a time offset. I'm provided the following data (MySQL data type is listed as TIMESTAMP(6) WITH TIME ZONE and the label suggests its GMT):

df['times']
0    2021-03-09 21:54:58-08:00
1    2021-03-09 18:42:53-08:00
2    2021-03-09 14:34:19-08:00
3    2021-03-08 21:17:42-08:00
4    2021-03-08 12:48:52-08:00
5    2021-03-08 13:39:28-08:00
6    2021-03-06 22:15:03-08:00
7    2021-03-06 22:15:03-07:00
8    2021-03-05 20:54:02-07:00
9    2021-03-06 14:37:51-07:00

df['times'].iloc[1]
datetime.datetime(2021, 3, 9, 18, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))

I want to convert this to remove the hours offset, and then ensure it's displayed in my Local Timezone (US/Pacific)

2021-03-09 18:42:53-08:00 = 2021-03-09 10:42:53

What is the most pythonic / Pandas way to do this? The time offset on the end of the data I receive appears to somehow already account for timezone with the offset, but it's not clear to me how to massage the value so it's showing the time in my local timezone.

Upvotes: 0

Views: 1778

Answers (2)

William Nelson
William Nelson

Reputation: 675

First, the date format you're working with is in ISO 8601 format. Each time is already in a local, non-UTC time zone. It would be very risky to assume that they're actually in UTC, or that you should subtract the offset from the listed time.

Python has built in functions that handle this format very well and will help you avoid making painful time mistakes.

I've attempted to replicate your sample data with the following code, but it's slightly different because I'm not familiar with Pandas.

import datetime
string_date_list =[
    '2021-03-09 21:54:58-08:00',
    '2021-03-09 18:42:53-08:00',
    '2021-03-09 14:34:19-08:00',
    '2021-03-08 21:17:42-08:00',
    '2021-03-08 12:48:52-08:00',
    '2021-03-08 13:39:28-08:00',
    '2021-03-06 22:15:03-08:00',
    '2021-03-06 22:15:03-07:00',
    '2021-03-05 20:54:02-07:00',
    '2021-03-06 14:37:51-07:00'
]
datetime_list = list(map(lambda x: datetime.datetime.fromisoformat(x), string_date_list))

datetime_list[1]
datetime.datetime(2021, 3, 9, 18, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))

In general, you should avoid removing the offset from a timestamp unless you're converting to UTC, or if you're at the final step where you present a time and date to a user. If you get in the habit of working with "timezone naive" timestamps, it's very easy to accidentally make a conversion twice, and suddenly your data is all off by eight hours, or breaks when Daylight saving time shifts.

Each object in your list is a datetime.datetime object. The relevant built-in function in the datetime library is datetime.astimezone()

To convert a single item in your list: datetime_list[1].astimezone() We still end up with a datetime object, but now it's in your local time zone. For me, I'm in MST, so I get:

datetime.datetime(2021, 3, 9, 19, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200), 'US Mountain Standard Time'))

Once we're certain that it's time to display data to a user, it's safe to perform format changes that remove the time zone with .strftime

print(datetime_list[1].astimezone().strftime("%Y-%m-%d %H:%M:%S"))

2021-03-09 19:42:53

To print the entire array, you could do either:

local_datetime_list = list(map(lambda x: print(x.astimezone().strftime("%Y-%m-%d %H:%M:%S")), datetime_list))

or

for time in datetime_list:
    print(time.astimezone().strftime("%Y-%m-%d %H:%M:%S"))

Both return:

2021-03-10 05:54:58
2021-03-10 02:42:53
2021-03-09 22:34:19
2021-03-09 05:17:42
2021-03-08 20:48:52
2021-03-08 21:39:28
2021-03-07 06:15:03
2021-03-07 05:15:03
2021-03-06 03:54:02
2021-03-06 21:37:51

If it's absolutely necessary to always display the time as it'd be seen in US/Pacific, (e.g., a coworker in Japan should still see these times as if they were in California), then you can use pytz.

import pytz
pacific_tz = pytz.timezone('US/Pacific')
print(datetime_list[1].astimezone(pacific_tz).strftime("%Y-%m-%d %H:%M:%S"))

2021-03-09 18:42:53

I do note that your question suggests that the expected answer is 10:42:53. If you're 100% sure that that's right. You can do this:

print(datetime_list[1].replace(tzinfo=None) + datetime_list[1].utcoffset())

or, to process the entire list:

remove_offset = list(map(lambda x: print(x.replace(tzinfo=None) + x.utcoffset()), datetime_list))
2021-03-09 13:54:58
2021-03-09 10:42:53
2021-03-09 06:34:19
2021-03-08 13:17:42
2021-03-08 04:48:52
2021-03-08 05:39:28
2021-03-06 14:15:03
2021-03-06 15:15:03
2021-03-05 13:54:02
2021-03-06 07:37:51

Upvotes: 0

bbd108
bbd108

Reputation: 998

I think this should do the trick for you. See the to_datetime documentation for more information.

s = pd.Series(
    [
        '2021-03-09 21:54:58-08:00',
        '2021-03-09 18:42:53-08:00',
        '2021-03-09 14:34:19-08:00',
        '2021-03-08 21:17:42-08:00',
        '2021-03-08 12:48:52-08:00',
        '2021-03-08 13:39:28-08:00',
        '2021-03-06 22:15:03-08:00',
        '2021-03-06 22:15:03-07:00',
        '2021-03-05 20:54:02-07:00',
        '2021-03-06 14:37:51-07:00'
    ]
)

pd.to_datetime(s, utc=True).dt.tz_convert('US/Pacific').dt.tz_localize(None) 

Upvotes: 1

Related Questions