Reputation: 1645
I have a dataframe with columns A and B as shown below. I would like to calculate the mean of the values in column B in a sliding window. The sliding window size is not constant and should be set based on column A. i.e. the window size is set for a value limit of 200 in column A. Below example gives a clear description of the window size:
A: 10 150 200 220 300 350 400 410 500
B: 0 0 0 1 0 1 1 1 0 mean
[0 0 0] 0
[0 0 1 0 1] 0.4
[0 1 0 1 1] 0.6
[1 0 1 1 1] 0.8
[0 1 1 1 0] 0.6
[1 1 1 0] 0.75
[1 1 0] 0.66
[1 0] 0.5
[0] 0
Output: 0 0.4 0.6 0.8 0.8 0.8 0.8 0.8 0.75
Now, for each row/coordinate in column A, all windows containing the coordinate are considered and should retain the highest mean value which gives the results as shown in column 'output'.
I wish to have the output as shown above. The output should like:
A B Output
10 0 0
150 0 0.4
200 0 0.6
220 1 0.8
300 0 0.8
350 1 0.8
400 1 0.8
410 1 0.8
500 0 0.75
there is a similar question at Sliding window in R and
rollapply(B, 2*k-1, function(x) max(rollmean(x, k)), partial = TRUE)
gives the solution with k as the window size. The difference is the window size which is not constant in the current question.
Could someone be able to provide any solution in R?
Upvotes: 1
Views: 2313
Reputation: 13122
This seems to work:
#data
DF <- data.frame(A = c(10, 150, 200, 220, 300, 350, 400, 410, 500),
B = c(0, 0, 0, 1, 0, 1, 1, 1, 0))
#size of the different windows
rolls <- findInterval(DF$A + 200, DF$A)
#find the mean for every interval
fun <- function(from, to) { mean(DF$B[from:to]) }
means <- mapply(fun, 1:nrow(DF), rolls)
#in which windows is every value of DF$A
fun2 <- function(x, from, to) { x %in% from:to }
output <- rep(NA, nrow(DF))
for(i in 1:nrow(DF))
{
output[i] <- max(means[mapply(fun2, i, 1:nrow(DF), rolls)])
}
DF$output <- output
> DF
A B output
1 10 0 0.00
2 150 0 0.40
3 200 0 0.60
4 220 1 0.80
5 300 0 0.80
6 350 1 0.80
7 400 1 0.80
8 410 1 0.80
9 500 0 0.75
Upvotes: 0
Reputation: 121077
Data in a reproducible form:
data <- data.frame(
A = c(10, 150, 200, 220, 300, 350, 400, 410, 500) ,
B = c(0, 0, 0, 1, 0, 1, 1, 1, 0)
)
window_size <- 200
Just use vapply
or sapply
to loop over the values of A
, and calculate the mean of an approriate subset of B
.
data$Output <- with(
data,
vapply(
A,
function(x)
{
index <- x <= A & A <= x + window_size
mean(B[index])
},
numeric(1)
)
)
Upvotes: 1
Reputation: 307
Try this:
a=c(10,150,200,250,300,350,400)
b=c(0,0,0,1,1,1,0)
mean=rep(0,length(a))
window=200
for(i in 1:length(a)){
vals=which(a>=a[i] & a<=a[i]+window)
mean[i]=sum(b[vals])/length(vals)
}
Upvotes: 0