doctorer
doctorer

Reputation: 1772

shifting timezone for reshaped pandas dataframe

I am using Pandas dataframes with DatetimeIndex to manipulate timeseries data. The data is stored at UTC time and I usually keep it that way (with naive DatetimeIndex), and only use timezones for output. I like it that way because nothing in the world confuses me more than trying to manipuluate timezones.

e.g.

In: ts = pd.date_range('2017-01-01 00:00','2017-12-31 23:30',freq='30Min')
    data = np.random.rand(17520,1)
    df= pd.DataFrame(data,index=ts,columns = ['data'])
    df.head()

Out[15]: 
                         data
2017-01-01 00:00:00  0.697478
2017-01-01 00:30:00  0.506914
2017-01-01 01:00:00  0.792484
2017-01-01 01:30:00  0.043271
2017-01-01 02:00:00  0.558461

I want to plot a chart of data versus time for each day of the year so I reshape the dataframe to have time along the index and dates for columns

df.index = [df.index.time,df.index.date]
df_new = df['data'].unstack()
In: df_new.head()
Out : 
          2017-01-01  2017-01-02  2017-01-03  2017-01-04  2017-01-05  \
00:00:00    0.697478    0.143626    0.189567    0.061872    0.748223   
00:30:00    0.506914    0.470634    0.430101    0.551144    0.081071   
01:00:00    0.792484    0.045259    0.748604    0.305681    0.333207   
01:30:00    0.043271    0.276888    0.034643    0.413243    0.921668   
02:00:00    0.558461    0.723032    0.293308    0.597601    0.120549   

If I'm not worried about timezones i can plot like this:

fig, ax = plt.subplots()
ax.plot(df_new.index,df_new)

but I want to plot the data in the local timezone (tz = pytz.timezone('Australia/Sydney') making allowance for daylight savings time, but the times and dates are no longer Timestamp objects so I can't use Pandas timezone handling. Or can I?

Assuming I can't, I'm trying to do the shift manually, (given DST starts 1/10 at 2am and finishes 1/4 at 2am), so I've got this far:

df_new[[c for c in df_new.columns if c >= dt.datetime(2017,4,1) and c <dt.datetime(2017,10,1)]].shift_by(+10)
df_new[[c for c in df_new.columns if c < dt.datetime(2017,4,1) or c >= dt.datetime(2017,10,1)]].shift_by(+11)

but am not sure how to write the function shift_by. (This doesn't handle midnight to 2am on teh changeover days correctly, which is not ideal, but I could live with)

Upvotes: 1

Views: 1132

Answers (1)

cs95
cs95

Reputation: 402603

Use dt.tz_localize + dt.tz_convert to convert the dataframe dates to a particular timezone.

df.index = df.index.tz_localize('UTC').tz_convert('Australia/Sydney')
df.index = [df.index.time, df.index.date]

Be a little careful when creating the MuliIndex - as you observed, it creates two rows of duplicate timestamps, so if that's the case, get rid of it with duplicated:

df = df[~df.index.duplicated()]
df = df['data'].unstack()

You can also create subplots with df.plot:

df.plot(subplots=True)
plt.show()

Upvotes: 3

Related Questions