Reputation: 3175
I am looking for an efficient way of doing this:
Given a vector x
(you may assume the values are sorted):
x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)
and a vector y
of regularly spaced values along an interval, e.g. step of 1 along 0 thru 10:
y <- 0:10
how do I obtain the vector z
where the values from x
have been mapped to their closest in y
:
> z
[1] 0 1 2 6 10 10
Edit: obviously, this example is simple, but I would like it to work for any regularly spaced vector y
, i.e., not just for this case of step 1.
library(microbenchmark)
set.seed(42)
yMin <- -6
stepSize <- 0.001
x <- rnorm(10000)
y <- seq(yMin, 6, by = stepSize)
# Onyambu's first answer.
fn1 <- function(x, y) y[max.col(-abs(outer(x, y, "-")))]
# Onyambu's second answer.
fn2 <- function(x, y) y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
# Plonetheus' answer: although it works on my simple example, it does not work,
# e.g., when yMin is negative.
fn3 <- function(x, yMin, stepSize) {
z <- rep(0, length(x))
for (i in 1:length(x)) {
numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
}
else {
z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
}
}
return(z)
}
# Thiagogpsm's answer.
fn4 <- function(x, y) sapply(x, function(x_i, y) y[which.min(abs(x_i - y))], y)
microbenchmark(
fn1(x, y),
fn2(x, y),
fn3(x, yMin, stepSize),
fn4(x, y),
times = 3L)
#> Unit: milliseconds
#> expr min lq mean median
#> fn1(x, y) 5546.804339 5598.159531 6759.516597 5649.514724
#> fn2(x, y) 1.252469 1.705517 3.695469 2.158564
#> fn3(x, yMin, stepSize) 3.176284 3.190868 11.372397 3.205453
#> fn4(x, y) 888.288538 1843.955232 3489.842765 2799.621925
#> uq max neval cld
#> 7365.872725 9082.230727 3 b
#> 4.916968 7.675373 3 a
#> 15.470453 27.735453 3 a
#> 4790.619879 6781.617833 3 ab
### Verdict
The second solution `fn2` in my benchmark test above, i.e., Onyambu's second answer (based on `findInterval`) is the fastest but the solution (`fn3`) proposed by Plonetheus is a close second.
Upvotes: 1
Views: 75
Reputation: 754
If you know the minimum of y and how large each step is, then I believe you can do something like the following to solve it in O(N) time:
getZ <- function(x, yMin, stepSize) {
z <- rep(0, length(x))
for (i in 1:length(x)) {
numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
}
else {
z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
}
}
return(z)
}
With these values, for example,
x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)
yMin <- 0
stepSize <- 0.3
print(getZ(x, yMin, stepSize))
we get the expected output of:
[1] 0.0 0.6 2.1 5.7 9.9 9.9
Upvotes: 1
Reputation: 79208
One way could be:
y[max.col(-abs(outer(x, y, "-")))]
[1] 0 1 2 6 10 10
Eg
x1 <- c(0.01, 2.4, 1.3, 4.1, 6.2)
y1 <- c(1, 3, 5, 7, 9)
Results:
y1[max.col(-abs(outer(x1, y1, "-")))]
[1] 1 3 1 5 7
ie we see that 0.01 is close to 1 in the vector y, 2.4 is close to 3, 1.3 is close to 3, 4.1 is close to 5 and 6.2 is close to 7 as expected
If the data are sorted, then you could use the function findInterval
.
Since the step is the same, we do:
y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
[1] 0 1 2 6 10 10
y1[findInterval(x1, c(-Inf, y1+diff(y1[1:2])/2, Inf))]
[1] 1 3 1 5 7
Upvotes: 2
Reputation: 469
One way is to create a function that returns z_i
for each x_i
and apply it to the vector:
map_to_closest <- function(x_i, y) {
y[which.min(abs(x_i - y))]
}
sapply(x, map_to_closest, y)
[1] 0 1 2 6 10 10
Upvotes: 1