ClementWalter
ClementWalter

Reputation: 5272

which.min on a subset of a vector

I'd like to get the index of the minimum of some subset of a vector, but the index in the original vector, not the renumbered subset.

As for now, I've been using :

L = rnorm(20) # say this is the original vector
subset = runif(20)<0.3 # some conditions to extract the subset
ind_min = which.min(L[subset])
ind_sel = seq(L)[subset]
ind_min = ind_sel[ind_min]

but I guess there should be something more direct or cleaner. I've been thinking of using a trick such as :

L_tmp = L
L_tmp[!subset] = Inf
ind_min = which.min(L_tmp)

which is apparentlty more efficient :

> microbenchmark(method_1(), method_2(), unit = "relative")
Unit: relative
       expr      min       lq     mean   median       uq      max neval
 method_1() 3.699562 3.249635 3.119666 3.076819 2.928259 3.225849   100
 method_2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100

but I'm not really happy with it because I guess there should be something else. Any suggestion ?

Upvotes: 2

Views: 1028

Answers (3)

Cath
Cath

Reputation: 24074

you can try:

(seq(L))[subset][which.min(L[subset])]

which is similar to your first method but without creating temporary variables

benchmark result on a 20000 long vector L:

    method_cath<- function(){(seq(L))[subset][which.min(L[subset])]}
    method_FK_corr1 <- function(){min = min(L[subset])
                                  ind_min = intersect(which(L == min), seq(L)[subset])[1]
                                  return(ind_min)} 
    method_FK_corr2 <- function(){min = min(L[subset])
                                  ind_min = intersect(which(L == min), which(subset))[1]
                                  return(ind_min)} 
    method_1clm <- function(){ind_min = which.min(L[subset])
                              ind_sel = seq(L)[subset]
                              ind_min = ind_sel[ind_min]
                              return(ind_min)} 
    method_2clm <- function(){L_tmp = L
                              L_tmp[!subset] = Inf
                              ind_min = which.min(L_tmp)
                              return(ind_min)}

> microbenchmark(method_2clm(), method_cath(), method_1clm(), method_FK_corr2(), method_FK_corr1(), unit = "relative")
   # Unit: relative
   #               expr      min       lq     mean   median       uq       max neval cld
   #      method_2clm() 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100 a  
   #      method_cath() 1.312146 1.290370 1.282964 1.278178 1.282424 0.9191693   100  b 
   #      method_1clm() 1.295031 1.294642 1.303781 1.284630 1.279821 1.2977193   100  b 
   #  method_FK_corr2() 1.185821 1.166924 1.278030 1.155217 1.165738 4.9948007   100  b 
   #  method_FK_corr1() 1.683783 1.644797 1.746055 1.635293 1.636195 5.1616672   100   c

NB: I was getting NA as a result with @FedorenkoKristina original function, I tested 2 possible corrected functions, now all functions give the same result.

Upvotes: 5

Fedorenko Kristina
Fedorenko Kristina

Reputation: 2777

You can find min in the L[subset]. And then get the index in the L.

L = rnorm(20) # say this is the original vector
subset = runif(20)<0.3 # some conditions to extract the subset
min = min(L[subset])
ind_min = intersect(which(L == min), seq(L)[subset])[1]

Upvotes: 2

tstev
tstev

Reputation: 616

You could also use subset perhaps. Where subset is your condition.

L = rnorm(20) # say this is the original vector
subset = runif(20)<0.3 # some conditions to extract the subset
ind_min = which(L == min(subset(L,subset)))

I guess this is very similar to what Fedorenko Kristina suggested. She was faster than me.

Upvotes: 0

Related Questions