Reputation: 477
If I set the following DF and dictionary (pandas 0.22.0):
kwargs = {
'index': ['11-1000', '11-1001', '11-1002'],
'data': np.random.randint(5, size=(3,2)),
'columns': ['A', 'B']
}
df = pd.DataFrame(**kwargs)
df A B
>> 11-1000 2 1
>> 11-1001 1 4
>> 11-1002 2 3
and
by = {'11-1001': '11-1000', '11-1002': '11-1000'}
and want to group by this dictionary, the result seems incorrect:
df.groupby(by=by, level=0).get_group('11-1000')
>> A B
>> 11-1000 2 1
when I'm expecting something like
>> A B
>> 11-1001 1 4
>> 11-1002 2 3
If I have a MultIndex to start with, though:
df = df.set_index('A", append=True)
df
>> B
>> A
>> 11-1000 2 1
>> 11-1001 1 4
>> 11-1002 2 3
then it seems like groupby gives me what I want:
df.groupby(by=by, level=0).get_group('11-1000')
>> B
>> A
>> 11-1001 1 4
>> 11-1002 2 3
Any thoughts on this? I almost always use groupby with a MultiIndex, so I've not seen this behavior before and not sure if it's normal. How can I get my desired behavior without a MultiIndex?
Upvotes: 3
Views: 1836
Reputation: 1375
I believe the behavior of the level
parameter is not particularly well-defined without a MultiIndex.
Passing level=None
(which is the default) gets the behavior you want.
Upvotes: 2