Cmd-Z MyBrain
Cmd-Z MyBrain

Reputation: 33

Convert Timezone from Timestamp column to various timezones

I have a data set like the following:

start_time_UTC
2021-09-16T12:00:00-05:00
2021-09-15T19:00:00-05:00
2021-09-16T08:18:00-05:00
2021-09-16T12:22:10-05:00

My default time is UTC, but I would like to create multiple columns based of the start_time_UTC to create cst, mst, and est.

    from datetime import datetime as dt
    import pandas as pd
    from pytz import timezone
    import pytz
    
    df = read_dataframe('my_dataset')
    
    df['time_stamp'] = df['start_time']
    utc = timezone('UTC')
    cst = timezone('US/Central')
    mst = timezone('US/Mountain')
    est = timezone('US/Eastern')

    # my issue begins at 'published_time"
    published_time = time_stamp.apply(lambda x: dt.strptime(df.time_stamp, '%a, %d %b %Y %H:%M:%S %Z'))
    time_utc = published_time.replace(tzinfo=utc)
    time_cst = published_time.replace(tzinfo=cst)
    time_mst = published_time.replace(tzinfo=mst)
    time_est = published_time.replace(tzinfo=est)
       
    # then call using time_timezone
    df['time_published_cst'] = time_cst.strftime('%I:%M:%S %p %Z')
    df['time_published_est'] = time_est.strftime('%I:%M:%S %p %Z')
    df['time_published_mst'] = time_mst.strftime('%I:%M:%S %p %Z')
    df['time_published_utc'] = time_utc.strftime('%I:%M:%S %p %Z')

Inititally I received an error for the following that said, "TypeError: strptime() argument 1 must be str, not Series":

published_time = datetime.strptime(time_stamp, '%a, %d %b %Y %H:%M:%S %Z')

So I changed it using lambda:

published_time = time_stamp.apply(lambda x: dt.strptime(df.time_stamp, '%a, %d %b %Y %H:%M:%S %Z'))

I get an error saying, " NameError: name 'time_stamp' is not defined"

If you notice anything else I could be doing wrong in this script, I would appreciate the help.

Upvotes: 1

Views: 562

Answers (1)

Anders Källmar
Anders Källmar

Reputation: 366

Your start_time_UTC column in the data provided seems to be UTC-05:00. The solution below treats it as UTC to avoid extra steps. I've only included one of the specified timezones to make it shorter as I'm not completely sure I understand what you are trying to accomplish.

You won't need datetime or pytz for this. All functionality is provided by pandas.

# Convert column to datetime UTC
df['start_time_UTC'] = pd.to_datetime(df['start_time_UTC'], utc=True)

#             start_time_UTC
#0 2021-09-16 17:00:00+00:00
#1 2021-09-16 00:00:00+00:00
#2 2021-09-16 13:18:00+00:00
#3 2021-09-16 17:22:10+00:00

# Create new column with converted timezone (still datetime)
df['start_time_est'] = df['start_time_UTC'].dt.tz_convert('US/Eastern')

#             start_time_UTC            start_time_est
#0 2021-09-16 17:00:00+00:00 2021-09-16 13:00:00-04:00
#1 2021-09-16 00:00:00+00:00 2021-09-15 20:00:00-04:00
#2 2021-09-16 13:18:00+00:00 2021-09-16 09:18:00-04:00
#3 2021-09-16 17:22:10+00:00 2021-09-16 13:22:10-04:00

# Create new column with timezone aware time_published per the specified format string
df['time_published_est'] = df['start_time_est'].dt.strftime('%I:%M:%S %p %Z')

#             start_time_UTC            start_time_est time_published_est
#0 2021-09-16 17:00:00+00:00 2021-09-16 13:00:00-04:00    01:00:00 PM EDT
#1 2021-09-16 00:00:00+00:00 2021-09-15 20:00:00-04:00    08:00:00 PM EDT
#2 2021-09-16 13:18:00+00:00 2021-09-16 09:18:00-04:00    09:18:00 AM EDT
#3 2021-09-16 17:22:10+00:00 2021-09-16 13:22:10-04:00    01:22:10 PM EDT

Upvotes: 3

Related Questions