Ironbeard
Ironbeard

Reputation: 477

Pandas GroupBy without MultiIndex

If I set the following DF and dictionary (pandas 0.22.0):

kwargs = {
  'index': ['11-1000', '11-1001', '11-1002'],
  'data': np.random.randint(5, size=(3,2)),
  'columns': ['A', 'B']
}
df = pd.DataFrame(**kwargs)

df         A  B
>> 11-1000 2  1
>> 11-1001 1  4
>> 11-1002 2  3

and

by = {'11-1001': '11-1000', '11-1002': '11-1000'}

and want to group by this dictionary, the result seems incorrect:

df.groupby(by=by, level=0).get_group('11-1000')
>>         A B
>> 11-1000 2 1

when I'm expecting something like

>>         A  B
>> 11-1001 1  4
>> 11-1002 2  3

If I have a MultIndex to start with, though:

df = df.set_index('A", append=True)
df
>>            B
>>         A
>> 11-1000 2  1
>> 11-1001 1  4
>> 11-1002 2  3

then it seems like groupby gives me what I want:

df.groupby(by=by, level=0).get_group('11-1000')
>>            B
>>         A
>> 11-1001 1  4
>> 11-1002 2  3

Any thoughts on this? I almost always use groupby with a MultiIndex, so I've not seen this behavior before and not sure if it's normal. How can I get my desired behavior without a MultiIndex?

Upvotes: 3

Views: 1836

Answers (1)

Victor Chubukov
Victor Chubukov

Reputation: 1375

I believe the behavior of the level parameter is not particularly well-defined without a MultiIndex.

Passing level=None (which is the default) gets the behavior you want.

Upvotes: 2

Related Questions