DataFrame column with Timestamps, need to localize multiple different timezones (AttributeError: Can only use .dt accessor with datetimelike values)

Question

I've got a DataFrame (6M rows) with 2 columns, one column contains local time (timezone naive) and the other contains the timezone. Something like this:

|    | SCHEDULED_DEPARTURE   | ORIGIN_TZ           |
|---:|:----------------------|:--------------------|
|  0 | 2020-11-30 11:40:00   | America/New_York    |
|  1 | 2020-11-30 16:51:00   | America/New_York    |
|  2 | 2020-11-30 09:05:00   | America/Chicago     |
|  3 | 2020-11-30 19:18:00   | America/Chicago     |
|  4 | 2020-11-30 10:36:00   | America/New_York    |
|  5 | 2020-11-30 12:10:00   | America/Los_Angeles |
|  6 | 2020-11-30 16:05:00   | America/New_York    |
|  7 | 2020-11-30 12:14:00   | America/New_York    |
|  8 | 2020-11-30 16:05:00   | America/New_York    |
|  9 | 2020-11-30 12:40:00   | America/Chicago     |

I'm trying to localize each row of SCHEDULED_DEPARTURE using the a for routine that subsets the df by each timezone, adds the timezone, and keeps looping:

for tz in df['ORIGIN_TZ'].unique():
    mask_tz = (df['ORIGIN_TZ'] == tz)
    df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = df.loc[mask_tz,'SCHEDULED_DEPARTURE'].dt.tz_localize(tz)

Oddly enough, sometimes it works and sometimes it returns the following error:

AttributeError: Can only use .dt accessor with datetimelike values

When extracting the SCHEDULED_DEPARTURE column, the type is clearly datetime-like:

Name: SCHEDULED_DEPARTURE, Length: 5714008, dtype: datetime64[ns]

Do you know how to solve this? Is it possible to have more than 1 timezone per column?

Here's code to replicate thee sample df:

df = pd.DataFrame({'SCHEDULED_DEPARTURE': {0: pd.Timestamp('2020-11-30 10:15:00'), 1: pd.Timestamp('2020-11-30 07:55:00'), 2: pd.Timestamp('2020-11-30 06:00:00'), 3: pd.Timestamp('2020-11-30 16:23:00'), 4: pd.Timestamp('2020-11-30 07:35:00'), 5: pd.Timestamp('2020-11-30 08:00:00'), 6: pd.Timestamp('2020-11-30 08:50:00'), 7: pd.Timestamp('2020-11-30 13:45:00'), 8: pd.Timestamp('2020-11-30 10:15:00'), 9: pd.Timestamp('2020-11-30 20:00:00')}, 'ORIGIN_TZ': {0: 'America/New_York', 1: 'America/New_York', 2: 'America/Denver', 3: 'America/New_York', 4: 'America/Chicago', 5: 'America/Chicago', 6: 'America/Los_Angeles', 7: 'America/Chicago', 8: 'America/New_York', 9: 'America/Los_Angeles'}})

Quang Hoang · Accepted Answer

Once you do:

df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = df.loc[mask_tz,'SCHEDULED_DEPARTURE'].dt.tz_localize(tz)

your column turns into object dtype and the next .dt access fails. Try to work on a copy:

s = df['SCHEDULED_DEPARTURE'].copy()
for tz in df['ORIGIN_TZ'].unique():
    mask_tz = (df['ORIGIN_TZ'] == tz)
    df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = s.loc[mask_tz].dt.tz_localize(tz)

Then df.loc[0,'SCHEDULED_DEPARTURE'] would give:

Timestamp('2020-11-30 10:15:00-0500', tz='America/New_York')

Your SCHEDULED_DEPARTURE column is still object dtype though.

DataFrame column with Timestamps, need to localize multiple different timezones (AttributeError: Can only use .dt accessor with datetimelike values)

Answers (1)

Related Questions