Reputation: 4807
I have a dataframe df
as follows:
df.to_dict()
Out[24]:
{'Col1': {0: 'A', 1: 'A', 2: 'B', 3: 'B', 4: 'B'},
'Col2': {0: 'a', 1: 'a', 2: 'b', 3: 'b', 4: 'c'},
'Col3': {0: 42, 1: 28, 2: 56, 3: 62, 4: 48}}
I am trying the following:
df2 = df.sort_values('Col3', ascending=False).groupby(['Col1', 'Col2'], as_index=True).apply(lambda x: x.head(1))
However, the index of df2
gets added an additional index variable to show the following:
df2.index
Out[26]:
MultiIndex([('A', 'a', 0),
('B', 'b', 3),
('B', 'c', 4)],
names=['Col1', 'Col2', None])
I need to get the multiindex only with Col1
and Col2
in df2
. I am not sure how to resolve the above.
My expected output should be:
df2.index
Out[26]:
MultiIndex([('A', 'a'),
('B', 'b'),
('B', 'c')],
names=['Col1', 'Col2'])
Edit:
I need to use the head()
method as I have another scenario where head(4)
will be implemented.
Upvotes: 1
Views: 47
Reputation: 323236
Using first
df.sort_values('Col3', ascending=False).groupby(['Col1', 'Col2'], as_index=True).first()
Fix your code reset_index(level=2,drop=True)
df2 = df.sort_values('Col3', ascending=False).groupby(['Col1', 'Col2'], as_index=True).apply(lambda x: x.head(1)).reset_index(level=2,drop=True)
Upvotes: 2