Reputation: 21991
In this dataframe:
region area other
alabama 99151.5 0.564506436
alabama 99151.5 0.193809515
arkansas 165927 0.878569179
arkansas 165927 0.00946268
arkansas 165927 0.075263353
colorado 408747 0.62052038
colorado 408747 0.723038731
georgia 117363 0.970624899
georgia 117363 0.534441671
idaho 198303 0.378282313
idaho 198303 0.836349349
I want to keep the 2 top regions by area, however I cannot use the pandas nlargest command since it does not allow me to keep duplicates in the area column. How do I do this?
-- EDIT:
Expected output:
region area other
colorado 408747 0.62052038
colorado 408747 0.723038731
idaho 198303 0.378282313
idaho 198303 0.836349349
Upvotes: 0
Views: 751
Reputation: 323326
You may need sort_values
before groupby
head
df.sort_values(['area','other']).groupby('area').head(2)
Upvotes: 3