Reputation: 945
Lets consider the following DataFrame spanning 10am to 4pm on Jan 16:
date_range1 = pd.date_range(dt(2017,1,16,10), dt(2017,1,16, 20), freq='2H')
df = pd.DataFrame(data = np.random.rand(len(date_range1),2), index = date_range1)
I reindex it with a slightly longer DateTimeindex spanning 0am to 11pm and obtain the desired result, with NaNs filling the time range between 0-10am and 4-11pm where there is no data:
date_range2 = pd.date_range(dt(2017,1,16,0), dt(2017,1,16, 23), freq='2H')
df.reindex(date_range2)
However if I modify the timezone of df
first, then doing the same reindex operation yield a DataFrame completely filled with NaNs values:
df = df.tz_localize("Europe/Helsinki").tz_convert('UTC')
df.reindex(date_range2)
Anyone has any idea what is happening here?
Upvotes: 3
Views: 607
Reputation: 29711
Fix:
One workaround would be to get rid of the timezone information from timezone-aware (tz
) DateTimeIndex
after converting to UTC time using tz_convert(None)
so that the difference in them (Here, UTC+02:00
) gets added to the resulting timestamps.
Then, these would reindex properly.
np.random.seed(42)
df1 = df.tz_localize("Europe/Helsinki").tz_convert('UTC').tz_localize(None)
df1.reindex(date_range2)
Right approach:
By default, the tz
keyword argument in pd.date_range
is None
and not "UTC"
. We need to change this accordingly as underneath the reindexing happens by comparing their UTC timestamps:
date_range2.tz = 'UTC'
df1 = df.tz_localize("Europe/Helsinki").tz_convert('UTC')
df1.reindex(date_range2)
Upvotes: 3