Reputation: 3824
I have a Pandas DataFrame with a column that is a tz-aware TimeStamp and I tried to groupby(level=0).first(). I get an incorrect result. Am I missing something or is it a pandas bug?
x = pd.DataFrame(index = [1,1,2,2,2], data = pd.date_range("7:00", "9:00", freq="30min", tz = 'US/Eastern'))
In [58]: x
Out[58]:
0
1 2016-09-08 07:00:00-04:00
1 2016-09-08 07:30:00-04:00
2 2016-09-08 08:00:00-04:00
2 2016-09-08 08:30:00-04:00
2 2016-09-08 09:00:00-04:00
In [59]: x.groupby(level=0).first()
Out[59]:
0
1 2016-09-08 11:00:00-04:00
2 2016-09-08 12:00:00-04:00
Upvotes: 3
Views: 188
Reputation: 3824
This is actually a pandas bug reported here:
https://github.com/pydata/pandas/issues/10668
Upvotes: 0
Reputation: 29711
I don't believe that it is a bug. If you go through the pytz
docs, it is clearly indicated that for timezone US/Eastern, there is no way to specify before / after the end-of-daylight-saving-time transition.
In such cases, sticking with UTC seems to be the best option.
Excerpt from the docs
:
Be aware that timezones (e.g., pytz.timezone('US/Eastern')) are not necessarily equal across timezone versions. So if data is localized to a specific timezone in the HDFStore using one version of a timezone library and that data is updated with another version, the data will be converted to UTC since these timezones are not considered equal. Either use the same version of timezone library or use tz_convert with the updated timezone definition.
The conversion can be done as follows:
A: using tz_localize
method to localize naive/time-aware datetime to UTC
data = pd.date_range("7:00", "9:00", freq="30min").tz_localize('UTC')
B: using tz_convert
method to convert pandas objects to convert
tz aware data to another time zone.
df = pd.DataFrame(index=[1,1,2,2,2], data=data.tz_convert('US/Eastern'))
df.groupby(level=0).first()
which results in:
0
1 2016-09-09 07:00:00-04:00
2 2016-09-09 08:00:00-04:00
#0 datetime64[ns, US/Eastern]
#dtype: object
Upvotes: 2