Reputation: 109844
I have two logical vectors x
and y
and weighted values, z
corresponding to each index. For column x
values that are TRUE
I'd like to find the nearest y
column index that is also TRUE
. Then grab the sum
of z
between min{x_i
, y_i
}. If there are two min{x_i
, y_i
} then the smaller sum
of z
is used.
x y z
1 FALSE TRUE 0.05647057
2 FALSE FALSE 0.09577802
3 TRUE FALSE 0.04150954
4 FALSE FALSE 0.07242995
5 FALSE TRUE 0.06220041
6 FALSE FALSE 0.01861535
7 FALSE FALSE 0.05056971
8 TRUE FALSE 0.07726933
9 FALSE TRUE 0.04669694
10 TRUE TRUE 0.02312497
There are 3 x
values that are TRUE
so we'll call them {x_1
, x_2
, x_3
}. Here I demonstrate the summing of the minimum indexes between each x_i
and it's nearest y_i
neighbor. What is the most efficient base R way to accomplish this. I have a method at the end that utilizes 2 lapply
telling me it's probably not efficient. I don't have a math background and usually there's some algebraic way to accomplish these sorts of tasks that is vectorized over the brute computational power.
## x_1
sum(z[3:5]) ## This one is smaller so use it
sum(z[1:3])
## x_2
sum(z[8:9])
## x_3
sum(z[10])
c(sum(z[3:5]), sum(z[8:9]), sum(z[10]))
[1] 0.17613990 0.12396627 0.02312497
MWE:
x <- y <- rep(FALSE, 10)
x[c(3, 8, 10)] <- TRUE
y[c(1, 5, 9, 10)] <- TRUE
set.seed(15)
z <- rnorm(10, .5, .25)/10
data.frame(x=x, y=y, z=z)
Here is an approach that is less than optimal:
dat <- data.frame(x=x, y=y, z=z)
sapply(which(dat[, "x"]), function(x) {
ylocs <- which(dat[, "y"])
dists <- abs(x - ylocs)
min.ylocs <- ylocs[min(dists) == dists]
min(sapply(min.ylocs, function(y, x2 = x) {
sum(dat[, "z"][x2:y])
}))
})
## [1] 0.17613990 0.12396627 0.02312497
I'd prefer to keep the solution within base.
Upvotes: 0
Views: 375
Reputation: 269371
This uses no loops or apply functions. We use na.locf
from zoo to move the index of the last TRUE y
up giving fwd
and the next TRUE y
back giving bck
. Finally we determine which of the two corresponding sums is greater. This depends on na.locf
in the zoo package but at the end we
extract the core code from zoo to avoid the dependence:
library(zoo) # na.locf
x <- dat$x
y <- dat$y
z <- dat$z
yy <- ifelse(y, TRUE, NA) * seq_along(y)
fwd <- na.locf(yy, fromLast = FALSE)[x]
bck <- na.locf(yy, fromLast = TRUE)[x]
cs <- cumsum(z)
pmin(cs[x] - cs[fwd] + z[fwd], cs[bck] - cs[x] + z[x])
The last line gives:
[1] 0.17613990 0.12396627 0.02312497
Here is a mini version of na.locf
. The library call above could be replaced with this.
# code extracted from zoo package
na.locf <- function(x, fromLast = FALSE) {
L <- !is.na(x)
if (fromLast) rev(c(NA, rev(which(L)))[cumsum(rev(L)) + 1])
else c(NA, which(L))[cumsum(L)+1L]
}
REVISED: some improvements.
Upvotes: 3