xirururu
xirururu

Reputation: 5508

How can I get top n values with its index in R?

I have a data frame with just one column, I want to find the largest three values with it's index. For example, my data frame df looks like:

  distance
1 1
2 4
3 2
4 3
5 4
6 5
7 5

I want to find the largest 3 value with its index, so my expected result is:

  distance    
6 5
7 5
2 4
5 4
4 3

How can I do this? Since I have just one column, is it also possible with list instead of data frame?

Upvotes: 3

Views: 10611

Answers (7)

micah
micah

Reputation: 1218

Get top percentage (proportion) of any column

df <- df %>% slice_max(IndexCol, prop = .25)

or by a group

df <- df %>% group_by(col1, col2) %>% slice_max(IndexCol, prop = .25)

https://dplyr.tidyverse.org/reference/slice.html

Upvotes: 0

Manos Papadakis
Manos Papadakis

Reputation: 593

You can use function nth from package Rfast for getting the index or the values

> x=runif(100000)
> num.of.nths <- 3
> Rfast2::benchmark(a<-Rfast::nth(x,3,num.of.nths,TRUE,TRUE),b<-order(x,decreasing = T)[1:3],times = 10)
   milliseconds 
                                        min     mean     max
a <- Rfast::nth(x, 3, 3, TRUE, TRUE) 1.6483  2.12419  3.1238
b <- order(x, decreasing = T)[1:3]   6.8648 12.31633 27.1988
> 
> a
      [,1]
[1,]  8058
[2,] 63946
[3,] 17556
> b
[1]  8058 63946 17556

Upvotes: 1

Minstein
Minstein

Reputation: 582

Using the libaray data.table is a faster solution because setorder is faster than order and sort:

library(data.table)

select_top_n<-function(scores,n_top){
    d <- data.frame(
        x   = copy(scores),
        indice=seq(1,length(scores)))
    
    setDT(d)
    setorder(d,-x)
    n_top_indice<-d$indice[1:n_top]
    return(n_top_indice)
}


select_top_n2<-function(scores,n_top){
    
    n_top_indice<-order(-scores)[1:n_top]
    return(n_top_indice)
}

select_top_n3<-function(scores,n_top){
    
    n_top_indice<-sort(s, index.return=TRUE, decreasing=TRUE)$ix[1:n_top]
    return(n_top_indice)
}

Testing:

set.seed(123)
s=runif(100000)

library(microbenchmark)
mbm<-microbenchmark(
    ind1 = select_top_n(s,100),
    ind2=select_top_n2(s,100),
    ind3=select_top_n3(s,100),
    times = 10L
)

Output:

Unit: milliseconds
 expr       min       lq      mean    median        uq       max neval
 ind1  5.824576  5.98959  6.209746  6.052658  6.270312  7.422736    10
 ind2  9.627950 10.08661 10.274867 10.377451 10.560912 10.588223    10
 ind3 10.397383 11.32129 12.087122 12.498817 12.856840 13.155845    10

Refer to Getting the top values by group

Upvotes: 1

komal sharma
komal sharma

Reputation: 11

If you are looking for one column to sort from increasing to decreasing order

rownames = rownames(df)
indexes <- order(df$ColumnName,decreasing = TRUE)[1:N]

result <- NULL
for (i in indexes)
  result<- c(rownames[i],result)

result

Here, we have saved the rownames in 'result' vector. This will return the indexes as well.

Upvotes: 1

akrun
akrun

Reputation: 886958

We can use sort with index.return=TRUE to return the value with the index in a list. Then we can subset the list based on the first 3 unique elements in the 'x'.

lst <- sort(df1$distance, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),3))
#$x
#[1] 5 5 4 4 3

#$ix
#[1] 6 7 2 5 4

Upvotes: 13

SabDeM
SabDeM

Reputation: 7190

A little clumsy version of my previous code:

 df[order(df$distance, decreasing = TRUE)[sort(unique(df$distance))], , drop = FALSE]
  distance
6        5
7        5
2        4
5        4
4        3

Upvotes: 2

Theodor
Theodor

Reputation: 1026

df[order(df, decreasing=TRUE)[1:3],,drop=FALSE]

If you have more columns, then you should have

 df[order(df$column_name, decreasing=TRUE)[1:3],,drop=FALSE]

Upvotes: 1

Related Questions