Reversal
Reversal

Reputation: 622

Fast way to find index of an element

I have a unitary segment [0,1] and several points on it, represented by a vector v. Moreover, I have a number k between 0 and 1 and I'd like to know what's the index of the interval the number falls in. I was wondering if exists a smart way instead of cycling on elements.

k <- 0.67 
v <- c( 0.12 , 0.25 , 0.48 , 0.78 , 1)

Upvotes: 1

Views: 67

Answers (3)

potterzot
potterzot

Reputation: 648

I have been working on something very similar, except that would round to the nearest value in the vector instead of returning the index. I've been working on a c++ script with Rcpp. Below I benchmark the three solutions above as well as the cpp script I wrote, which returns the value rather than the index, but would be easy to modify. findInterval seems by far the fastest if you don't count the cpp code:

library(Rcpp)
cppFunction('NumericVector nearest_neighbor(NumericVector x, NumericVector v) {
  int n, m, i, j, k, adj;
            n = v.size();
            m = x.size();

            NumericVector y(m);
            for (int a=0;a<m;a++) {
            if (x[a] < v[0]) {
            return v[0];
            } else if (x[a] >= v[n-1]) {
            return v[n-1];
            }
            i = 0;
            j = 0;
            k = n-1;
            while(!(v[j] <= x[a] && x[a] < v[j+1])) {
            if (x[a] < v[j]) {
            k = j;
            } else {
            i = j+1;
            }
            j = div((k-i), 2).quot + i;
            }
            if (pow(x[a] - v[j+1],2) < pow(x[a] - v[j],2)) {
            adj = 1;
            } else {
            adj = 0;
            }
            y[a] = v[j + adj];
            }
            return y;
            }')


library(microbenchmark)
v <- 1:10000
k <- 6.5
microbenchmark(c(max(which(v<k)),min(which(v>k))))
#> Unit: microseconds
#>                                     expr     min       lq     mean
#>  c(max(which(v < k)), min(which(v > k))) 200.907 202.6625 290.4425
#>    median       uq      max neval
#>  203.8475 234.6495 1789.479   100
microbenchmark(Position(function(x) k < sort(x), v) - 1)
#> Unit: microseconds
#>                                      expr     min       lq     mean
#>  Position(function(x) k < sort(x), v) - 1 262.773 282.9995 296.9819
#>   median      uq     max neval
#>  298.919 305.849 428.715   100
microbenchmark(findInterval(k, v))
#> Unit: microseconds
#>                expr    min      lq     mean median    uq      max neval
#>  findInterval(k, v) 33.199 33.7105 50.62627 34.264 37.04 1483.216   100
microbenchmark(nearest_neighbor(k, v))
#> Unit: microseconds
#>                    expr    min     lq     mean  median     uq      max
#>  nearest_neighbor(k, v) 12.567 13.166 29.45889 13.4945 13.787 1574.045
#>  neval
#>    100

Upvotes: 1

LyzandeR
LyzandeR

Reputation: 37889

One way is the following:

The intervals are n - 1 the length of the vector v (n). So with this in mind you could use Position which returns the position of the first occurrence of function(x) 0.67 < x:

#sort helps in case v is not sorted
Position(function(x) 0.67 < sort(x), v) - 1
[1] 3

The position of the above occurrence is 4 and the interval is the 3rd.

Upvotes: 2

Ian FitzPatrick
Ian FitzPatrick

Reputation: 16

Something like this perhaps: c(max(which(v<k)),min(which(v>k)))

Upvotes: 0

Related Questions