Reputation: 1167

Pandas dataframe select entire rows with highest values from a specified column

I have a dataframe where I want to return the full row that contains the largest values out of a specified column. So let's say I create a dataframe like this:

df = pd.DataFrame(np.random.randint(0,100,size=(25, 4)), columns=list('ABCD'))

Then I'd have a table like this (sorry I can't get a proper table to form, so I just made a short one up):

A    B    C    D
14   67   35   22
75   21   34   64

And let's say it goes on for 25 rows like that. I want to take the top 5 largest values of column C and return those full rows.

If I do:

df['C'].nlargest()

it returns those 5 largest values, but I want it to return the full row.

I thought the below would work, but it gives me an error of "IndexError: indices are out-of-bounds":

df[df['C'].nlargest()]

I know this will be an easy solution for many people here, but it's stumped me. Thanks for your help.

Upvotes: 4

Answers (4)

BENY

Reputation: 323376

without using nlargest, by using sort_values

df.sort_values('C',ascending=False).iloc[:5,]

or using head

df.sort_values('C',ascending=False).head(5)

or using quantile

df[df.C>df.C.quantile(1-(5/len(df)))]

Upvotes: 4

Divakar

Reputation: 221714

Approach #1 One approach -

df.iloc[df.C.argsort()[::-1][:5]]

With simplified slicing, reduces to -

df.iloc[df.C.argsort()[:-6:-1]]

Approach #2 For performance, if the order of those largest n rows is not important, we can also use np.argpartition -

df.iloc[df.C.values.argpartition(-5)[:-6:-1]]

Upvotes: 3

gold_cy

Reputation: 14236

Quick and dirty

df.where(df.C.nlargest()).dropna()

       A     B     C     D
7   98.0  52.0  93.0  65.0
13  76.0  20.0  86.0  68.0
16  83.0   6.0  92.0  51.0
22  97.0  15.0  84.0   8.0
24  32.0  80.0  87.0  34.0

Upvotes: 1

MaxU - stand with Ukraine

Reputation: 210972

you want to use columns parameter:

In [53]: df.nlargest(5, columns=['C'])
Out[53]:
     A   B   C   D
17  43  91  95  32
18  13  36  81  56
7   61  90  76  85
16  68  21  73  68
14   3  64  71  59

Upvotes: 14

Pandas dataframe select entire rows with highest values from a specified column

Answers (4)

Related Questions