Reputation: 151
I would like to find out the three closest numbers in a vector. Something like
v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26
I tried with sort(colSums(as.matrix(dist(x))))[1:3]
, and it kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.
There is already an answer for matlab, but I do not know how to translate it to R:
%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);
How to find two closest (nearest) values within a vector in MATLAB?
Upvotes: 15
Views: 2061
Reputation: 814
For use in a dataframe,
data%>%
group_by(var1,var2)%>%
do(data.frame(findClosest(.$val,3)))
Upvotes: 0
Reputation: 11255
My assumption is that the for the n
nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]
. That is, finding the minimum of diff(x, lag = n - 1L)
.
findClosest <- function(x, n) {
x <- sort(x)
x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]
}
findClosest(v, 3L)
[1] 23 25 26
Upvotes: 16
Reputation: 388862
A base R option, idea being we first sort
the vector and subtract every i
th element with i + n - 1
element in the sorted vector and select the group which has minimum difference.
closest_n_vectors <- function(v, n) {
v1 <- sort(v)
inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x)
v1[x + n -1] - v1[x]))
v1[inds: (inds + n - 1)]
}
closest_n_vectors(v, 3)
#[1] 23 25 26
closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2
closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67
closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23
In case of tie this will return the numbers with smallest value since we are using which.min
.
BENCHMARKS
Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now
set.seed(1234)
x <- sample(100000000, 100000)
identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3),
closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE
microbenchmark::microbenchmark(
antoine = findClosest_antoine(x, 3),
Sotos = findClosest_Sotos(x, 3),
Ronak = closest_n_vectors_Ronak(x, 3),
Cole = findClosest_Cole(x, 3),
times = 10
)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a
Upvotes: 7
Reputation: 51582
An idea is to use zoo
library to do a rolling operation, i.e.
library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26
Or make it into a function,
findClosest <- function(vec, n) {
require(zoo)
vec1 <- sort(vec)
m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
return(m1[which.min(m1[, 1]),][-1])
}
findClosest(v, 3)
#[1] 23 25 26
Upvotes: 6
Reputation: 6921
Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff
and windowed sum.
You could write a much shorter function but I wrote it step by step to make it easier to follow.
v <- c(10,23,25,26,38,50)
#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#'
findClosest <- function(v, n) {
# Sort and remove NA
v <- sort(v, na.last = NA)
# Compute L1 distances between closest points. We know each point is next to
# its closest neighbour since we sorted.
delta <- diff(v)
# Compute sum of L1 distances on a rolling window with n - 1 elements
# Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
withingroup_distances <- zoo::rollsum(delta, k = n - 1)
# Now it's simply finding the group with minimum within-group sum
# And working out the elements
group_index <- which.min(withingroup_distances)
element_indices <- group_index + 0:(n-1)
v[element_indices]
}
findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26
Upvotes: 7