Pandas groupby adding additional number when as_index=True

Question

I have a dataframe df as follows:

df.to_dict()
Out[24]: 
{'Col1': {0: 'A', 1: 'A', 2: 'B', 3: 'B', 4: 'B'},
 'Col2': {0: 'a', 1: 'a', 2: 'b', 3: 'b', 4: 'c'},
 'Col3': {0: 42, 1: 28, 2: 56, 3: 62, 4: 48}}

I am trying the following:

df2 = df.sort_values('Col3', ascending=False).groupby(['Col1', 'Col2'], as_index=True).apply(lambda x: x.head(1))

However, the index of df2 gets added an additional index variable to show the following:

df2.index
Out[26]: 
MultiIndex([('A', 'a', 0),
            ('B', 'b', 3),
            ('B', 'c', 4)],
           names=['Col1', 'Col2', None])

I need to get the multiindex only with Col1 and Col2 in df2. I am not sure how to resolve the above.

My expected output should be:

df2.index
Out[26]: 
MultiIndex([('A', 'a'),
            ('B', 'b'),
            ('B', 'c')],
           names=['Col1', 'Col2'])

Edit:

I need to use the head() method as I have another scenario where head(4) will be implemented.

BENY · Accepted Answer

Using first

df.sort_values('Col3', ascending=False).groupby(['Col1', 'Col2'], as_index=True).first()

Fix your code reset_index(level=2,drop=True)

df2 = df.sort_values('Col3', ascending=False).groupby(['Col1', 'Col2'], as_index=True).apply(lambda x: x.head(1)).reset_index(level=2,drop=True)

Pandas groupby adding additional number when as_index=True

Answers (1)

Related Questions