user15964
user15964

Reputation: 2639

pandas groupby datetime.date object is not consistent

For example

import datetime
data={'date':[datetime.date(2020,1,i) for i in range(11,13)],
     'a1':range(11,13),
     'a2':range(21,23)}
df=pd.DataFrame(data)

If we groupby only the date column, everything is ok

g=df.groupby('date')
print(g.groups)
g.get_group(list(g.groups.keys())[0])

gives

{datetime.date(2020, 1, 11): Int64Index([0], dtype='int64'), datetime.date(2020, 1, 12): Int64Index([1], dtype='int64')}

    date    a1  a2
0   2020-01-11  11  21

However, if we groupby two column to form multiIndex, we got problem

g=df.groupby(['date','a1'])
print(g.groups)
g.get_group(list(g.groups.keys())[0])

gives

{(Timestamp('2020-01-11 00:00:00'), 11): Int64Index([0], dtype='int64'), (Timestamp('2020-01-12 00:00:00'), 12): Int64Index([1], dtype='int64')}

and error message

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) in 1 g=df.groupby(['date','a1']) 2 print(g.groups) ----> 3 g.get_group(list(g.groups.keys())[0])

~/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in get_group(self, name, obj) 678 inds = self._get_index(name) 679 if not len(inds): --> 680 raise KeyError(name) 681 682 return obj.take(inds, axis=self.axis)

KeyError: (Timestamp('2020-01-11 00:00:00'), 11)

We can see pandas groupby is too smart to change datetime.date object to Timestamp object. And it mess up indexing, we can not get the correct group. Is it a bug?

Upvotes: 1

Views: 916

Answers (1)

M_S_N
M_S_N

Reputation: 2810

IIUC you can try grouping like this:

g=df.groupby([['date','a1']])
print(g.groups)
g.get_group(list(g.groups.keys())[0])

Upvotes: 1

Related Questions