Reputation: 3
let's say my sample data looks as below.
id freq
1 4
2 3
3 2
4 2
5 1
freq column tells the frequency of each id. The question is: I want the top 3 frequencies. the output should be..
id freq
1 4
2 3
3 2
4 2
I used the following code.
d$rank <- rank(-d$freq,ties.method="min")
where d
is my data frame. I used rank
command so that i can later select top 3 frequencies.
The output i got is:
id freq rank
1 4 1
2 3 2
3 2 3
4 2 3
5 1 5
The problem is rank 4 is missing. I want continuous ranks to handle many duplicated values in my original data frame. Any help is appreciated.
Thanks.
Upvotes: 0
Views: 1931
Reputation: 886998
Assuming that the 'freq' is ordered in descending, we get the unique
elements of 'freq', select the first 3 with head
, use %in%
to get the logical index of those elements that in the 'freq' column, and subset
the rows.
subset(df1, freq %in% head(unique(freq),3))
# id freq
#1 1 4
#2 2 3
#3 3 2
#4 4 2
If we are using rank
, then dense_rank
from dplyr
will be an option
library(dplyr)
df1 %>%
filter(dense_rank(-freq) < 4)
Or another option using frank
from data.table
(contributed by @David Arenburg),
library(data.table)
setDT(df)[, .SD[frank(-freq, ties.method = "dense") < 4]]
Upvotes: 1
Reputation: 70256
Here's another base R approach:
df[cumsum(!duplicated(df$freq))<4,]
# id freq
#1 1 4
#2 2 3
#3 3 2
#4 4 2
This assumes the data is already in descending order (as in the example).
In case you're going to use external libraries like dplyr, I'd suggest using top_n
:
library(dplyr)
top_n(df, 3, freq)
# id freq
#1 1 4
#2 2 3
#3 3 2
#4 4 2
Upvotes: 2