motam79
motam79

Reputation: 3824

Pandas groupby datatime index, possible bug

I have a Pandas DataFrame with a column that is a tz-aware TimeStamp and I tried to groupby(level=0).first(). I get an incorrect result. Am I missing something or is it a pandas bug?

x = pd.DataFrame(index = [1,1,2,2,2], data = pd.date_range("7:00", "9:00", freq="30min", tz = 'US/Eastern'))

In [58]: x
Out[58]: 


     0
1 2016-09-08 07:00:00-04:00
1 2016-09-08 07:30:00-04:00
2 2016-09-08 08:00:00-04:00
2 2016-09-08 08:30:00-04:00
2 2016-09-08 09:00:00-04:00

In [59]: x.groupby(level=0).first()
Out[59]: 
                          0
1 2016-09-08 11:00:00-04:00
2 2016-09-08 12:00:00-04:00

Upvotes: 3

Views: 188

Answers (2)

motam79
motam79

Reputation: 3824

This is actually a pandas bug reported here:

https://github.com/pydata/pandas/issues/10668

Upvotes: 0

Nickil Maveli
Nickil Maveli

Reputation: 29711

I don't believe that it is a bug. If you go through the pytz docs, it is clearly indicated that for timezone US/Eastern, there is no way to specify before / after the end-of-daylight-saving-time transition.

In such cases, sticking with UTC seems to be the best option.

Excerpt from the docs:

 Be aware that timezones (e.g., pytz.timezone('US/Eastern')) are not
 necessarily equal across timezone versions. So if data is localized to
 a specific timezone in the HDFStore using one version of a timezone
 library and that data is updated with another version, the data will
 be converted to UTC since these timezones are not considered equal.
 Either use the same version of timezone library or use tz_convert with
 the updated timezone definition.

The conversion can be done as follows:

A: using tz_localize method to localize naive/time-aware datetime to UTC

data = pd.date_range("7:00", "9:00", freq="30min").tz_localize('UTC')

B: using tz_convert method to convert pandas objects to convert tz aware data to another time zone.

df = pd.DataFrame(index=[1,1,2,2,2], data=data.tz_convert('US/Eastern'))
df.groupby(level=0).first()

which results in:

                          0
1 2016-09-09 07:00:00-04:00
2 2016-09-09 08:00:00-04:00

#0    datetime64[ns, US/Eastern]
#dtype: object

Upvotes: 2

Related Questions