Reputation: 2639
For example
import datetime
data={'date':[datetime.date(2020,1,i) for i in range(11,13)],
'a1':range(11,13),
'a2':range(21,23)}
df=pd.DataFrame(data)
If we groupby only the date column, everything is ok
g=df.groupby('date')
print(g.groups)
g.get_group(list(g.groups.keys())[0])
gives
{datetime.date(2020, 1, 11): Int64Index([0], dtype='int64'), datetime.date(2020, 1, 12): Int64Index([1], dtype='int64')}
date a1 a2
0 2020-01-11 11 21
However, if we groupby two column to form multiIndex, we got problem
g=df.groupby(['date','a1'])
print(g.groups)
g.get_group(list(g.groups.keys())[0])
gives
{(Timestamp('2020-01-11 00:00:00'), 11): Int64Index([0], dtype='int64'), (Timestamp('2020-01-12 00:00:00'), 12): Int64Index([1], dtype='int64')}
and error message
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) in 1 g=df.groupby(['date','a1']) 2 print(g.groups) ----> 3 g.get_group(list(g.groups.keys())[0])
~/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in get_group(self, name, obj) 678 inds = self._get_index(name) 679 if not len(inds): --> 680 raise KeyError(name) 681 682 return obj.take(inds, axis=self.axis)
KeyError: (Timestamp('2020-01-11 00:00:00'), 11)
We can see pandas groupby
is too smart to change datetime.date
object to Timestamp
object. And it mess up indexing, we can not get the correct group. Is it a bug?
Upvotes: 1
Views: 916
Reputation: 2810
IIUC you can try grouping like this:
g=df.groupby([['date','a1']])
print(g.groups)
g.get_group(list(g.groups.keys())[0])
Upvotes: 1