Reputation: 217
I'm trying to calculate the cumulative sums of a vector with the elements 0,1,NA, based on the following conditions:
1) Just calculate cumsums in between the NA values
2) If a 0 comes after a 1, then I want the cumsum+1
Out is the original vector:
out[1:100]
[1] NA NA NA NA 0 1 1 NA NA NA 1 NA 0 NA 0 1 NA NA 0 NA 0 1 0 0 0 NA 0 1 0 1 0 0 1 0 1 1 0 0 0 0
[41] 1 0 NA 0 0 NA 1 NA 0 1 NA 0 NA 0 1 1 NA 1 NA 0 0 0 1 1 NA NA NA 0 0 NA 0 0 0 1 0 NA 1 0 NA 0
[81] 1 1 0 1 1 0 1 0 NA 0 1 0 1 0 NA 0 1 0 0 1
I used this code to calculate cumsums inbetween NA values:
g <- cumsum(is.na(head(c(0, out), -1)))
out1 <- ave(out, g, FUN = cumsum)
I get
out1[1:100]
[1] NA NA NA NA 0 1 2 NA NA NA 1 NA 0 NA 0 1 NA NA 0 NA 0 1 1 1 1 NA 0 1 1 2 2 2 3 3 4 5 5 5 5 5
[41] 6 6 NA 0 0 NA 1 NA 0 1 NA 0 NA 0 1 2 NA 1 NA 0 0 0 1 2 NA NA NA 0 0 NA 0 0 0 1 1 NA 1 1 NA 0
[81] 1 2 2 3 4 4 5 5 NA 0 1 1 2 2 NA 0 1 1 1 2
Now I just have the problem that I want to get cumsum+1 if a 1 is followed by a zero (and here just the first zero)
e.g.
0 1 1 0 0 0 1 0 1 1 NA
with the function I have now I would get
0 1 2 2 2 2 3 3 4 5 NA, but what I want is:
0 1 2 3 3 3 4 5 6 7 NA
Can anybody help? Thank you.
Upvotes: 3
Views: 329
Reputation: 5951
There should be an easier way, but you can try this:
temp1 <- out
temp1[is.na(temp1)] <- 0
temp2 <- (temp1[2:length(temp1)] == 0) * (temp1[1:length(temp1)-1]==1)
temp2 <- c(0, temp2)
out1 <- cumsum(temp1+temp2)- cummax(cumsum((temp1+temp2))*is.na(out))
out1[is.na(out)] <- NA
rm(temp1, temp2)
out1
Upvotes: 0
Reputation: 14346
Try this
out <- c(NA, NA, NA, NA, 0, 1, 1, NA, NA, NA, 1, NA, 0, NA, 0, 1, NA,
NA, 0, NA, 0, 1, 0, 0, 0, NA, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0,
0, 0, 0, 1, 0, NA, 0, 0, NA, 1, NA, 0, 1, NA, 0, NA, 0, 1, 1,
NA, 1, NA, 0, 0, 0, 1, 1, NA, NA, NA, 0, 0, NA, 0, 0, 0, 1, 0,
NA, 1, 0, NA, 0, 1, 1, 0, 1, 1, 0, 1, 0, NA, 0, 1, 0, 1, 0, NA,
0, 1, 0, 0, 1, NA, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, NA)
as.numeric(unlist(lapply(split(out, cumsum(is.na(out))),
function(x) {
if (length(x) == 1) return(x)
idx <- which(x[-length(x)] == 1 & x[-1] == 0)
res <- cumsum(x[-1])
for (i in seq_along(idx)) {
if (i == length(idx))
res[seq(idx[i], length(res))] <- res[seq(idx[i], length(res))] + i
else
res[seq(idx[i], idx[i + 1] - 1)] <- res[seq(idx[i], idx[i + 1] - 1)] + i
}
c(NA, res)
}
)))
# [1] NA NA NA NA 0 1 2 NA NA NA 1 NA 0 NA 0 1 NA NA 0 NA 0 1 2 2 2
# [26] NA 0 1 2 3 4 4 5 6 7 8 9 9 9 9 10 11 NA 0 0 NA 1 NA 0 1
# [51] NA 0 NA 0 1 2 NA 1 NA 0 0 0 1 2 NA NA NA 0 0 NA 0 0 0 1 2
# [76] NA 1 2 NA 0 1 2 3 4 5 6 7 8 NA 0 1 2 3 4 NA 0 1 2 2 3
#[101] NA 0 1 2 3 3 3 4 5 6 7 NA
Alternatively, you can calculate out1
keeping your existing method (using ave
) and then "add" the missing bits to it by identifying the sequences that need to be added
na.pos <- which(is.na(out))
idx <- which(out[-length(out)] == 1 & out[-1] == 0)
idx2 <- which(is.na(out[-1]) | (out[-length(out)] == 1 & out[-1] == 0))
beg <- idx + 1
end <- idx2[findInterval(idx, idx2) + 1]
to.add <- as.numeric(unlist(sapply(rle(findInterval(idx, na.pos))$lengths, seq, from = 1)))
for(i in seq_along(beg))
out1[seq(beg[i], end[i])] <- out1[seq(beg[i], end[i])] + to.add[i]
Upvotes: 1