Reputation: 5508
I have a data frame with just one column, I want to find the largest three values with it's index. For example, my data frame df
looks like:
distance
1 1
2 4
3 2
4 3
5 4
6 5
7 5
I want to find the largest 3 value with its index, so my expected result is:
distance
6 5
7 5
2 4
5 4
4 3
How can I do this? Since I have just one column, is it also possible with list instead of data frame?
Upvotes: 3
Views: 10611
Reputation: 1218
Get top percentage (proportion) of any column
df <- df %>% slice_max(IndexCol, prop = .25)
or by a group
df <- df %>% group_by(col1, col2) %>% slice_max(IndexCol, prop = .25)
https://dplyr.tidyverse.org/reference/slice.html
Upvotes: 0
Reputation: 593
You can use function nth
from package Rfast
for getting the index or the values
> x=runif(100000)
> num.of.nths <- 3
> Rfast2::benchmark(a<-Rfast::nth(x,3,num.of.nths,TRUE,TRUE),b<-order(x,decreasing = T)[1:3],times = 10)
milliseconds
min mean max
a <- Rfast::nth(x, 3, 3, TRUE, TRUE) 1.6483 2.12419 3.1238
b <- order(x, decreasing = T)[1:3] 6.8648 12.31633 27.1988
>
> a
[,1]
[1,] 8058
[2,] 63946
[3,] 17556
> b
[1] 8058 63946 17556
Upvotes: 1
Reputation: 582
Using the libaray data.table
is a faster solution because setorder
is faster than order
and sort
:
library(data.table)
select_top_n<-function(scores,n_top){
d <- data.frame(
x = copy(scores),
indice=seq(1,length(scores)))
setDT(d)
setorder(d,-x)
n_top_indice<-d$indice[1:n_top]
return(n_top_indice)
}
select_top_n2<-function(scores,n_top){
n_top_indice<-order(-scores)[1:n_top]
return(n_top_indice)
}
select_top_n3<-function(scores,n_top){
n_top_indice<-sort(s, index.return=TRUE, decreasing=TRUE)$ix[1:n_top]
return(n_top_indice)
}
Testing:
set.seed(123)
s=runif(100000)
library(microbenchmark)
mbm<-microbenchmark(
ind1 = select_top_n(s,100),
ind2=select_top_n2(s,100),
ind3=select_top_n3(s,100),
times = 10L
)
Output:
Unit: milliseconds
expr min lq mean median uq max neval
ind1 5.824576 5.98959 6.209746 6.052658 6.270312 7.422736 10
ind2 9.627950 10.08661 10.274867 10.377451 10.560912 10.588223 10
ind3 10.397383 11.32129 12.087122 12.498817 12.856840 13.155845 10
Refer to Getting the top values by group
Upvotes: 1
Reputation: 11
If you are looking for one column to sort from increasing to decreasing order
rownames = rownames(df)
indexes <- order(df$ColumnName,decreasing = TRUE)[1:N]
result <- NULL
for (i in indexes)
result<- c(rownames[i],result)
result
Here, we have saved the rownames in 'result' vector. This will return the indexes as well.
Upvotes: 1
Reputation: 886958
We can use sort
with index.return=TRUE
to return the value with the index in a list
. Then we can subset the list
based on the first 3 unique elements in the 'x'.
lst <- sort(df1$distance, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),3))
#$x
#[1] 5 5 4 4 3
#$ix
#[1] 6 7 2 5 4
Upvotes: 13
Reputation: 7190
A little clumsy version of my previous code:
df[order(df$distance, decreasing = TRUE)[sort(unique(df$distance))], , drop = FALSE]
distance
6 5
7 5
2 4
5 4
4 3
Upvotes: 2
Reputation: 1026
df[order(df, decreasing=TRUE)[1:3],,drop=FALSE]
If you have more columns, then you should have
df[order(df$column_name, decreasing=TRUE)[1:3],,drop=FALSE]
Upvotes: 1