Reputation: 622
I have a unitary segment [0,1] and several points on it, represented by a vector v
. Moreover, I have a number k
between 0 and 1 and I'd like to know what's the index of the interval the number falls in.
I was wondering if exists a smart way instead of cycling on elements.
k <- 0.67
v <- c( 0.12 , 0.25 , 0.48 , 0.78 , 1)
Upvotes: 1
Views: 67
Reputation: 648
I have been working on something very similar, except that would round to the nearest value in the vector instead of returning the index. I've been working on a c++ script with Rcpp. Below I benchmark the three solutions above as well as the cpp script I wrote, which returns the value rather than the index, but would be easy to modify. findInterval
seems by far the fastest if you don't count the cpp code:
library(Rcpp)
cppFunction('NumericVector nearest_neighbor(NumericVector x, NumericVector v) {
int n, m, i, j, k, adj;
n = v.size();
m = x.size();
NumericVector y(m);
for (int a=0;a<m;a++) {
if (x[a] < v[0]) {
return v[0];
} else if (x[a] >= v[n-1]) {
return v[n-1];
}
i = 0;
j = 0;
k = n-1;
while(!(v[j] <= x[a] && x[a] < v[j+1])) {
if (x[a] < v[j]) {
k = j;
} else {
i = j+1;
}
j = div((k-i), 2).quot + i;
}
if (pow(x[a] - v[j+1],2) < pow(x[a] - v[j],2)) {
adj = 1;
} else {
adj = 0;
}
y[a] = v[j + adj];
}
return y;
}')
library(microbenchmark)
v <- 1:10000
k <- 6.5
microbenchmark(c(max(which(v<k)),min(which(v>k))))
#> Unit: microseconds
#> expr min lq mean
#> c(max(which(v < k)), min(which(v > k))) 200.907 202.6625 290.4425
#> median uq max neval
#> 203.8475 234.6495 1789.479 100
microbenchmark(Position(function(x) k < sort(x), v) - 1)
#> Unit: microseconds
#> expr min lq mean
#> Position(function(x) k < sort(x), v) - 1 262.773 282.9995 296.9819
#> median uq max neval
#> 298.919 305.849 428.715 100
microbenchmark(findInterval(k, v))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> findInterval(k, v) 33.199 33.7105 50.62627 34.264 37.04 1483.216 100
microbenchmark(nearest_neighbor(k, v))
#> Unit: microseconds
#> expr min lq mean median uq max
#> nearest_neighbor(k, v) 12.567 13.166 29.45889 13.4945 13.787 1574.045
#> neval
#> 100
Upvotes: 1
Reputation: 37889
One way is the following:
The intervals are n - 1 the length of the vector v (n). So with this in mind you could use Position
which returns the position of the first occurrence of function(x) 0.67 < x
:
#sort helps in case v is not sorted
Position(function(x) 0.67 < sort(x), v) - 1
[1] 3
The position of the above occurrence is 4 and the interval is the 3rd.
Upvotes: 2
Reputation: 16
Something like this perhaps: c(max(which(v<k)),min(which(v>k)))
Upvotes: 0