Reputation: 1970
I've got a DataFrame (6M rows) with 2 columns, one column contains local time (timezone naive) and the other contains the timezone. Something like this:
| | SCHEDULED_DEPARTURE | ORIGIN_TZ |
|---:|:----------------------|:--------------------|
| 0 | 2020-11-30 11:40:00 | America/New_York |
| 1 | 2020-11-30 16:51:00 | America/New_York |
| 2 | 2020-11-30 09:05:00 | America/Chicago |
| 3 | 2020-11-30 19:18:00 | America/Chicago |
| 4 | 2020-11-30 10:36:00 | America/New_York |
| 5 | 2020-11-30 12:10:00 | America/Los_Angeles |
| 6 | 2020-11-30 16:05:00 | America/New_York |
| 7 | 2020-11-30 12:14:00 | America/New_York |
| 8 | 2020-11-30 16:05:00 | America/New_York |
| 9 | 2020-11-30 12:40:00 | America/Chicago |
I'm trying to localize each row of SCHEDULED_DEPARTURE
using the a for
routine that subsets the df
by each timezone, adds the timezone, and keeps looping:
for tz in df['ORIGIN_TZ'].unique():
mask_tz = (df['ORIGIN_TZ'] == tz)
df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = df.loc[mask_tz,'SCHEDULED_DEPARTURE'].dt.tz_localize(tz)
Oddly enough, sometimes it works and sometimes it returns the following error:
AttributeError: Can only use .dt accessor with datetimelike values
When extracting the SCHEDULED_DEPARTURE
column, the type is clearly datetime-like:
Name: SCHEDULED_DEPARTURE, Length: 5714008, dtype: datetime64[ns]
Do you know how to solve this? Is it possible to have more than 1 timezone per column?
Here's code to replicate thee sample df:
df = pd.DataFrame({'SCHEDULED_DEPARTURE': {0: pd.Timestamp('2020-11-30 10:15:00'), 1: pd.Timestamp('2020-11-30 07:55:00'), 2: pd.Timestamp('2020-11-30 06:00:00'), 3: pd.Timestamp('2020-11-30 16:23:00'), 4: pd.Timestamp('2020-11-30 07:35:00'), 5: pd.Timestamp('2020-11-30 08:00:00'), 6: pd.Timestamp('2020-11-30 08:50:00'), 7: pd.Timestamp('2020-11-30 13:45:00'), 8: pd.Timestamp('2020-11-30 10:15:00'), 9: pd.Timestamp('2020-11-30 20:00:00')}, 'ORIGIN_TZ': {0: 'America/New_York', 1: 'America/New_York', 2: 'America/Denver', 3: 'America/New_York', 4: 'America/Chicago', 5: 'America/Chicago', 6: 'America/Los_Angeles', 7: 'America/Chicago', 8: 'America/New_York', 9: 'America/Los_Angeles'}})
Upvotes: 2
Views: 256
Reputation: 150785
Once you do:
df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = df.loc[mask_tz,'SCHEDULED_DEPARTURE'].dt.tz_localize(tz)
your column turns into object dtype and the next .dt
access fails. Try to work on a copy:
s = df['SCHEDULED_DEPARTURE'].copy()
for tz in df['ORIGIN_TZ'].unique():
mask_tz = (df['ORIGIN_TZ'] == tz)
df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = s.loc[mask_tz].dt.tz_localize(tz)
Then df.loc[0,'SCHEDULED_DEPARTURE']
would give:
Timestamp('2020-11-30 10:15:00-0500', tz='America/New_York')
Your SCHEDULED_DEPARTURE
column is still object
dtype though.
Upvotes: 1