Radar
Radar

Reputation: 945

Python Pandas: Reindex DataFrame after Timezone conversion

Lets consider the following DataFrame spanning 10am to 4pm on Jan 16:

date_range1 = pd.date_range(dt(2017,1,16,10), dt(2017,1,16, 20), freq='2H')
df = pd.DataFrame(data = np.random.rand(len(date_range1),2), index = date_range1)

I reindex it with a slightly longer DateTimeindex spanning 0am to 11pm and obtain the desired result, with NaNs filling the time range between 0-10am and 4-11pm where there is no data:

date_range2 = pd.date_range(dt(2017,1,16,0), dt(2017,1,16, 23), freq='2H')
df.reindex(date_range2)

enter image description here

However if I modify the timezone of df first, then doing the same reindex operation yield a DataFrame completely filled with NaNs values:

df = df.tz_localize("Europe/Helsinki").tz_convert('UTC')  
df.reindex(date_range2)

enter image description here

Anyone has any idea what is happening here?

Upvotes: 3

Views: 607

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Fix:

One workaround would be to get rid of the timezone information from timezone-aware (tz) DateTimeIndex after converting to UTC time using tz_convert(None) so that the difference in them (Here, UTC+02:00) gets added to the resulting timestamps.

Then, these would reindex properly.

np.random.seed(42)
df1 = df.tz_localize("Europe/Helsinki").tz_convert('UTC').tz_localize(None)
df1.reindex(date_range2)

enter image description here


Right approach:

By default, the tz keyword argument in pd.date_range is None and not "UTC". We need to change this accordingly as underneath the reindexing happens by comparing their UTC timestamps:

date_range2.tz = 'UTC'
df1 = df.tz_localize("Europe/Helsinki").tz_convert('UTC')
df1.reindex(date_range2)

enter image description here

Upvotes: 3

Related Questions