Pat
Pat

Reputation: 217

Conditional cumsum based on the next value of a vector

I'm trying to calculate the cumulative sums of a vector with the elements 0,1,NA, based on the following conditions:

1) Just calculate cumsums in between the NA values

2) If a 0 comes after a 1, then I want the cumsum+1

Out is the original vector:

out[1:100]
  [1] NA NA NA NA  0  1  1 NA NA NA  1 NA  0 NA  0  1 NA NA  0 NA  0  1  0  0  0 NA  0  1  0  1  0  0  1  0  1  1  0  0  0  0
 [41]  1  0 NA  0  0 NA  1 NA  0  1 NA  0 NA  0  1  1 NA  1 NA  0  0  0  1  1 NA NA NA  0  0 NA  0  0  0  1  0 NA  1  0 NA  0
 [81]  1  1  0  1  1  0  1  0 NA  0  1  0  1  0 NA  0  1  0  0  1

I used this code to calculate cumsums inbetween NA values:

g <- cumsum(is.na(head(c(0, out), -1)))
out1 <- ave(out, g, FUN = cumsum)

I get

out1[1:100]
  [1] NA NA NA NA  0  1  2 NA NA NA  1 NA  0 NA  0  1 NA NA  0 NA  0  1  1  1  1 NA  0  1  1  2  2  2  3  3  4  5  5  5  5  5
 [41]  6  6 NA  0  0 NA  1 NA  0  1 NA  0 NA  0  1  2 NA  1 NA  0  0  0  1  2 NA NA NA  0  0 NA  0  0  0  1  1 NA  1  1 NA  0
 [81]  1  2  2  3  4  4  5  5 NA  0  1  1  2  2 NA  0  1  1  1  2

Now I just have the problem that I want to get cumsum+1 if a 1 is followed by a zero (and here just the first zero)

e.g.

0 1 1 0 0 0 1 0 1 1 NA

with the function I have now I would get

0 1 2 2 2 2 3 3 4 5 NA, but what I want is:

0 1 2 3 3 3 4 5 6 7 NA

Can anybody help? Thank you.

Upvotes: 3

Views: 329

Answers (2)

dimitris_ps
dimitris_ps

Reputation: 5951

There should be an easier way, but you can try this:

temp1 <- out
temp1[is.na(temp1)] <- 0

temp2 <- (temp1[2:length(temp1)] == 0) * (temp1[1:length(temp1)-1]==1)
temp2 <- c(0, temp2)

out1 <- cumsum(temp1+temp2)- cummax(cumsum((temp1+temp2))*is.na(out))
out1[is.na(out)] <- NA

rm(temp1, temp2)

out1

Upvotes: 0

konvas
konvas

Reputation: 14346

Try this

out <- c(NA, NA, NA, NA, 0, 1, 1, NA, NA, NA, 1, NA, 0, NA, 0, 1, NA, 
    NA, 0, NA, 0, 1, 0, 0, 0, NA, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 
    0, 0, 0, 1, 0, NA, 0, 0, NA, 1, NA, 0, 1, NA, 0, NA, 0, 1, 1, 
    NA, 1, NA, 0, 0, 0, 1, 1, NA, NA, NA, 0, 0, NA, 0, 0, 0, 1, 0, 
    NA, 1, 0, NA, 0, 1, 1, 0, 1, 1, 0, 1, 0, NA, 0, 1, 0, 1, 0, NA, 
    0, 1, 0, 0, 1, NA, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, NA)

as.numeric(unlist(lapply(split(out, cumsum(is.na(out))),
    function(x) {
        if (length(x) == 1) return(x)
        idx <- which(x[-length(x)] == 1 & x[-1] == 0)
        res <- cumsum(x[-1])
        for (i in seq_along(idx)) {
            if (i == length(idx))
                res[seq(idx[i], length(res))] <- res[seq(idx[i], length(res))] + i
            else
                res[seq(idx[i], idx[i + 1] - 1)] <- res[seq(idx[i], idx[i + 1] - 1)] + i
        }
        c(NA, res)
    }
)))
#  [1] NA NA NA NA  0  1  2 NA NA NA  1 NA  0 NA  0  1 NA NA  0 NA  0  1  2  2  2
# [26] NA  0  1  2  3  4  4  5  6  7  8  9  9  9  9 10 11 NA  0  0 NA  1 NA  0  1
# [51] NA  0 NA  0  1  2 NA  1 NA  0  0  0  1  2 NA NA NA  0  0 NA  0  0  0  1  2
# [76] NA  1  2 NA  0  1  2  3  4  5  6  7  8 NA  0  1  2  3  4 NA  0  1  2  2  3
#[101] NA  0  1  2  3  3  3  4  5  6  7 NA

Alternatively, you can calculate out1 keeping your existing method (using ave) and then "add" the missing bits to it by identifying the sequences that need to be added

na.pos <- which(is.na(out))
idx <- which(out[-length(out)] == 1 & out[-1] == 0)
idx2 <- which(is.na(out[-1]) | (out[-length(out)] == 1 & out[-1] == 0))

beg <- idx + 1
end <- idx2[findInterval(idx, idx2) + 1]
to.add <- as.numeric(unlist(sapply(rle(findInterval(idx, na.pos))$lengths, seq, from = 1)))

for(i in seq_along(beg))
    out1[seq(beg[i], end[i])] <- out1[seq(beg[i], end[i])] + to.add[i]

Upvotes: 1

Related Questions