Reputation: 33
I have a data set like the following:
start_time_UTC
2021-09-16T12:00:00-05:00
2021-09-15T19:00:00-05:00
2021-09-16T08:18:00-05:00
2021-09-16T12:22:10-05:00
My default time is UTC, but I would like to create multiple columns based of the start_time_UTC to create cst, mst, and est.
from datetime import datetime as dt
import pandas as pd
from pytz import timezone
import pytz
df = read_dataframe('my_dataset')
df['time_stamp'] = df['start_time']
utc = timezone('UTC')
cst = timezone('US/Central')
mst = timezone('US/Mountain')
est = timezone('US/Eastern')
# my issue begins at 'published_time"
published_time = time_stamp.apply(lambda x: dt.strptime(df.time_stamp, '%a, %d %b %Y %H:%M:%S %Z'))
time_utc = published_time.replace(tzinfo=utc)
time_cst = published_time.replace(tzinfo=cst)
time_mst = published_time.replace(tzinfo=mst)
time_est = published_time.replace(tzinfo=est)
# then call using time_timezone
df['time_published_cst'] = time_cst.strftime('%I:%M:%S %p %Z')
df['time_published_est'] = time_est.strftime('%I:%M:%S %p %Z')
df['time_published_mst'] = time_mst.strftime('%I:%M:%S %p %Z')
df['time_published_utc'] = time_utc.strftime('%I:%M:%S %p %Z')
Inititally I received an error for the following that said, "TypeError: strptime() argument 1 must be str, not Series":
published_time = datetime.strptime(time_stamp, '%a, %d %b %Y %H:%M:%S %Z')
So I changed it using lambda:
published_time = time_stamp.apply(lambda x: dt.strptime(df.time_stamp, '%a, %d %b %Y %H:%M:%S %Z'))
I get an error saying, " NameError: name 'time_stamp' is not defined"
If you notice anything else I could be doing wrong in this script, I would appreciate the help.
Upvotes: 1
Views: 562
Reputation: 366
Your start_time_UTC
column in the data provided seems to be UTC-05:00. The solution below treats it as UTC to avoid extra steps. I've only included one of the specified timezones to make it shorter as I'm not completely sure I understand what you are trying to accomplish.
You won't need datetime
or pytz
for this. All functionality is provided by pandas
.
# Convert column to datetime UTC
df['start_time_UTC'] = pd.to_datetime(df['start_time_UTC'], utc=True)
# start_time_UTC
#0 2021-09-16 17:00:00+00:00
#1 2021-09-16 00:00:00+00:00
#2 2021-09-16 13:18:00+00:00
#3 2021-09-16 17:22:10+00:00
# Create new column with converted timezone (still datetime)
df['start_time_est'] = df['start_time_UTC'].dt.tz_convert('US/Eastern')
# start_time_UTC start_time_est
#0 2021-09-16 17:00:00+00:00 2021-09-16 13:00:00-04:00
#1 2021-09-16 00:00:00+00:00 2021-09-15 20:00:00-04:00
#2 2021-09-16 13:18:00+00:00 2021-09-16 09:18:00-04:00
#3 2021-09-16 17:22:10+00:00 2021-09-16 13:22:10-04:00
# Create new column with timezone aware time_published per the specified format string
df['time_published_est'] = df['start_time_est'].dt.strftime('%I:%M:%S %p %Z')
# start_time_UTC start_time_est time_published_est
#0 2021-09-16 17:00:00+00:00 2021-09-16 13:00:00-04:00 01:00:00 PM EDT
#1 2021-09-16 00:00:00+00:00 2021-09-15 20:00:00-04:00 08:00:00 PM EDT
#2 2021-09-16 13:18:00+00:00 2021-09-16 09:18:00-04:00 09:18:00 AM EDT
#3 2021-09-16 17:22:10+00:00 2021-09-16 13:22:10-04:00 01:22:10 PM EDT
Upvotes: 3