user308827
user308827

Reputation: 21991

keep N largest rows even when duplicates are present in pandas dataframe

In this dataframe:

region  area    other
alabama 99151.5 0.564506436
alabama 99151.5 0.193809515
arkansas    165927  0.878569179
arkansas    165927  0.00946268
arkansas    165927  0.075263353
colorado    408747  0.62052038
colorado    408747  0.723038731
georgia 117363  0.970624899
georgia 117363  0.534441671
idaho   198303  0.378282313
idaho   198303  0.836349349

I want to keep the 2 top regions by area, however I cannot use the pandas nlargest command since it does not allow me to keep duplicates in the area column. How do I do this?

-- EDIT:

Expected output:

region  area    other
colorado    408747  0.62052038
colorado    408747  0.723038731
idaho   198303  0.378282313
idaho   198303  0.836349349

Upvotes: 0

Views: 751

Answers (1)

BENY
BENY

Reputation: 323326

You may need sort_values before groupby head

df.sort_values(['area','other']).groupby('area').head(2)

Upvotes: 3

Related Questions