how to select top n values from a data frame retaining the duplicates in r

Question

let's say my sample data looks as below.

freq column tells the frequency of each id. The question is: I want the top 3 frequencies. the output should be..

I used the following code.

d$rank <- rank(-d$freq,ties.method="min")

where d is my data frame. I used rank command so that i can later select top 3 frequencies. The output i got is:

id freq rank
 1    4    1
 2    3    2
 3    2    3
 4    2    3
 5    1    5

The problem is rank 4 is missing. I want continuous ranks to handle many duplicated values in my original data frame. Any help is appreciated.

Thanks.

akrun · Accepted Answer

Assuming that the 'freq' is ordered in descending, we get the unique elements of 'freq', select the first 3 with head, use %in% to get the logical index of those elements that in the 'freq' column, and subset the rows.

subset(df1, freq %in% head(unique(freq),3))
#  id freq
#1  1    4
#2  2    3
#3  3    2
#4  4    2

If we are using rank, then dense_rank from dplyr will be an option

library(dplyr)
df1 %>%
    filter(dense_rank(-freq) < 4)

Or another option using frank from data.table (contributed by @David Arenburg),

library(data.table)
setDT(df)[, .SD[frank(-freq, ties.method = "dense") < 4]]

how to select top n values from a data frame retaining the duplicates in r

Answers (2)

Related Questions