Reputation: 3288
As a silly toy example, suppose
x=4.5
w=c(1,2,4,6,7)
I wonder if there is a simple R function that finds the index of the closest match to x
in w
. So if foo
is that function, foo(w,x)
would return 3
. The function match
is the right idea, but seems to apply only for exact matches.
Solutions here (e.g. which.min(abs(w - x))
, which(abs(w-x)==min(abs(w-x)))
, etc.) are all O(n)
instead of log(n)
(I'm assuming that w
is already sorted).
Upvotes: 52
Views: 31948
Reputation: 2166
Based on @neal-fultz answer, here is a simple function that uses findInterval()
:
get_closest_index <- function(x, vec){
# vec must be sorted
iv <- findInterval(x, vec)
dist_left <- x - vec[ifelse(iv == 0, NA, iv)]
dist_right <- vec[iv + 1] - x
ifelse(! is.na(dist_left) & (is.na(dist_right) | dist_left < dist_right), iv, iv + 1)
}
values <- c(-15, -0.01, 3.1, 6, 10, 100)
grid <- c(-2, -0.1, 0.1, 3, 7)
get_closest_index(values, grid)
#> [1] 1 2 4 5 5 5
Created on 2020-05-29 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 49448
You can use data.table
to do a binary search:
dt = data.table(w, val = w) # you'll see why val is needed in a sec
setattr(dt, "sorted", "w") # let data.table know that w is sorted
Note that if the column w
isn't already sorted, then you'll have to use setkey(dt, w)
instead of setattr(.)
.
# binary search and "roll" to the nearest neighbour
dt[J(x), roll = "nearest"]
# w val
#1: 4.5 4
In the final expression the val
column will have the you're looking for.
# or to get the index as Josh points out
# (and then you don't need the val column):
dt[J(x), .I, roll = "nearest", by = .EACHI]
# w .I
#1: 4.5 3
# or to get the index alone
dt[J(x), roll = "nearest", which = TRUE]
#[1] 3
Upvotes: 45
Reputation: 23014
See match.closest()
from the MALDIquant package:
> library(MALDIquant)
> match.closest(x, w)
[1] 3
Upvotes: 8
Reputation: 49
NearestValueSearch = function(x, w){
## A simple binary search algo
## Assume the w vector is sorted so we can use binary search
left = 1
right = length(w)
while(right - left > 1){
middle = floor((left + right) / 2)
if(x < w[middle]){
right = middle
}
else{
left = middle
}
}
if(abs(x - w[right]) < abs(x - w[left])){
return(right)
}
else{
return(left)
}
}
x = 4.5
w = c(1,2,4,6,7)
NearestValueSearch(x, w) # return 3
Upvotes: 4
Reputation: 5
You can always implement custom binary search algorithm to find the closest value. Alternately, you can leverage standard implementation of libc bsearch(). You can use other binary search implementations as well, but it does not change the fact that you have to implement the comparing function carefully to find the closest element in array. The issue with standard binary search implementation is that it is meant for exact comparison. That means your improvised comparing function needs to do some kind of exactification to figure out if an element in array is close-enough. To achieve it, the comparing function needs to have awareness of other elements in the array, especially following aspects:
To provide this extra knowledge in comparing function, the key needs to be packaged with additional information (not just the key value). Once the comparing function have awareness on these aspects, it can figure out if the element itself is closest. When it knows that it is the closest, it returns "match".
The the following C code finds the closest value.
#include <stdio.h>
#include <stdlib.h>
struct key {
int key_val;
int *array_head;
int array_size;
};
int compar(const void *k, const void *e) {
struct key *key = (struct key*)k;
int *elem = (int*)e;
int *arr_first = key->array_head;
int *arr_last = key->array_head + key->array_size -1;
int kv = key->key_val;
int dist_left;
int dist_right;
if (kv == *elem) {
/* easy case: if both same, got to be closest */
return 0;
} else if (key->array_size == 1) {
/* easy case: only element got to be closest */
return 0;
} else if (elem == arr_first) {
/* element is the first in array */
if (kv < *elem) {
/* if keyval is less the first element then
* first elem is closest.
*/
return 0;
} else {
/* check distance between first and 2nd elem.
* if distance with first elem is smaller, it is closest.
*/
dist_left = kv - *elem;
dist_right = *(elem+1) - kv;
return (dist_left <= dist_right) ? 0:1;
}
} else if (elem == arr_last) {
/* element is the last in array */
if (kv > *elem) {
/* if keyval is larger than the last element then
* last elem is closest.
*/
return 0;
} else {
/* check distance between last and last-but-one.
* if distance with last elem is smaller, it is closest.
*/
dist_left = kv - *(elem-1);
dist_right = *elem - kv;
return (dist_right <= dist_left) ? 0:-1;
}
}
/* condition for remaining cases (other cases are handled already):
* - elem is neither first or last in the array
* - array has atleast three elements.
*/
if (kv < *elem) {
/* keyval is smaller than elem */
if (kv <= *(elem -1)) {
/* keyval is smaller than previous (of "elem") too.
* hence, elem cannot be closest.
*/
return -1;
} else {
/* check distance between elem and elem-prev.
* if distance with elem is smaller, it is closest.
*/
dist_left = kv - *(elem -1);
dist_right = *elem - kv;
return (dist_right <= dist_left) ? 0:-1;
}
}
/* remaining case: (keyval > *elem) */
if (kv >= *(elem+1)) {
/* keyval is larger than next (of "elem") too.
* hence, elem cannot be closest.
*/
return 1;
}
/* check distance between elem and elem-next.
* if distance with elem is smaller, it is closest.
*/
dist_right = *(elem+1) - kv;
dist_left = kv - *elem;
return (dist_left <= dist_right) ? 0:1;
}
int main(int argc, char **argv) {
int arr[] = {10, 20, 30, 40, 50, 60, 70};
int *found;
struct key k;
if (argc < 2) {
return 1;
}
k.key_val = atoi(argv[1]);
k.array_head = arr;
k.array_size = sizeof(arr)/sizeof(int);
found = (int*)bsearch(&k, arr, sizeof(arr)/sizeof(int), sizeof(int),
compar);
if(found) {
printf("found closest: %d\n", *found);
} else {
printf("closest not found. absurd! \n");
}
return 0;
}
Needless to say that bsearch() in above example should never fail (unless the array size is zero).
If you implement your own custom binary search, essentially you have to embed same comparing logic in the main body of binary search code (instead of having this logic in comparing function in above example).
Upvotes: -2
Reputation: 9687
To do this on character vectors, Martin Morgan suggested this function on R-help:
bsearch7 <-
function(val, tab, L=1L, H=length(tab))
{
b <- cbind(L=rep(L, length(val)), H=rep(H, length(val)))
i0 <- seq_along(val)
repeat {
updt <- M <- b[i0,"L"] + (b[i0,"H"] - b[i0,"L"]) %/% 2L
tabM <- tab[M]
val0 <- val[i0]
i <- tabM < val0
updt[i] <- M[i] + 1L
i <- tabM > val0
updt[i] <- M[i] - 1L
b[i0 + i * length(val)] <- updt
i0 <- which(b[i0, "H"] >= b[i0, "L"])
if (!length(i0)) break;
}
b[,"L"] - 1L
}
Upvotes: 3
Reputation: 9687
R>findInterval(4.5, c(1,2,4,5,6))
[1] 3
will do that with price-is-right matching (closest without going over).
Upvotes: 46
Reputation: 2055
x = 4.5
w = c(1,2,4,6,7)
closestLoc = which(min(abs(w-x)))
closestVal = w[which(min(abs(w-x)))]
# On my phone- please pardon typos
If your vector is lengthy, try a 2-step approach:
x = 4.5
w = c(1,2,4,6,7)
sdev = sapply(w,function(v,x) abs(v-x), x = x)
closestLoc = which(min(sdev))
for maddeningly long vectors (millions of rows!, warning- this will actually be slower for data which is not very, very, very large.)
require(doMC)
registerDoMC()
closestLoc = which(min(foreach(i = w) %dopar% {
abs(i-x)
}))
This example is just to give you a basic idea of leveraging parallel processing when you have huge data. Note, I do not recommend you use it for simple & fast functions like abs().
Upvotes: 3