ben890
ben890

Reputation: 1133

Subset data frame to only include the nth highest value of a column

I have a data frame abc and would like to subset the data frame to only include the row with nth highest value of a certain variable "z". I know a simple solution here would be:

       library(plyr)
       abc <- arrange(abc, z)
       abc <- abc[n,]

But is there a way to do it without first ordering the data frame? I'm only asking because ordering seems to be expensive on larger data frames.

Here's an example df to work with:

    x  y   z
1   2  1 111
2   3  2 112
3   4  3 113
4   5  4 114
5   6  5 115
6   7  6 116
7   8  7 117
8   9  8 118
9  10  9 119
10 11 10 120

Upvotes: 0

Views: 1024

Answers (1)

akrun
akrun

Reputation: 886948

You may try

library(dplyr)
n <- 7
slice(abc, rank(z)[n])

Or as @nicola commented, a base R option would be

abc[rank(abc$z)==n,]

Update

If you want the nth highest with rank increasing

 slice(abc, rank(-z)[n])
 #  x y   z
 #1 5 4 114
 abc[nrow(abc)-rank(abc$z)+1==n,]
 #  x y   z
 #4 5 4 114

Upvotes: 3

Related Questions