Reputation: 99371
Consider the following named vector x
.
( x <- setNames(c(1, 2, 0, NA, 4, NA, NA, 6), letters[1:8]) )
# a b c d e f g h
# 1 2 0 NA 4 NA NA 6
I'd like to calculate the cumulative sum of x
while ignoring the NA
values. Many R functions have an argument na.rm
which removes NA
elements prior to calculations. cumsum()
is not one of them, which makes this operation a bit tricky.
I can do it this way.
y <- setNames(numeric(length(x)), names(x))
z <- cumsum(na.omit(x))
y[names(y) %in% names(z)] <- z
y[!names(y) %in% names(z)] <- x[is.na(x)]
y
# a b c d e f g h
# 1 3 3 NA 7 NA NA 13
But this seems excessive, and makes a lot of new assignments/copies. I'm sure there's a better way.
What better methods are there to return the cumulative sum while effectively ignoring NA
values?
Upvotes: 64
Views: 43233
Reputation: 17001
Benchmarking several options. collapse::fcumsum
is the fastest by far.
library(dplyr)
library(tidyr)
library(collapse)
x <- runif(1e5)
x[sample(1e5, 1e4)] <- NA
microbenchmark::microbenchmark(
ifelse = cumsum(ifelse(is.na(x), 0, x)) + x*0,
coalesce = cumsum(coalesce(x, 0)) + x*0,
na.omit = "[<-"(x, !is.na(x), cumsum(na.omit(x))),
is.na = local({b <- !is.na(x); "[<-"(x, b, cumsum(x[b]))}),
fcumsum = fcumsum(x),
check = "equal"
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> ifelse 1808.4 2672.40 3290.323 2853.80 3178.25 8807.5 100
#> coalesce 2575.8 3543.45 4427.820 3890.20 5344.55 8142.4 100
#> na.omit 1314.6 2056.25 2547.983 2231.50 2467.40 6259.2 100
#> is.na 910.5 1472.50 2020.346 1698.80 1955.75 5431.0 100
#> fcumsum 137.2 255.35 282.999 267.15 313.75 513.4 100
Upvotes: 0
Reputation: 41601
Another option is using the collapse
package with fcumsum
function like this:
( x <- setNames(c(1, 2, 0, NA, 4, NA, NA, 6), letters[1:8]) )
#> a b c d e f g h
#> 1 2 0 NA 4 NA NA 6
library(collapse)
fcumsum(x)
#> a b c d e f g h
#> 1 3 3 NA 7 NA NA 13
Created on 2022-08-24 with reprex v2.0.2
Upvotes: 2
Reputation: 4873
It's an old question but tidyr
gives a new solution.
Based on the idea of replacing NA
with zero.
require(tidyr)
cumsum(replace_na(x, 0))
a b c d e f g h
1 3 3 3 7 7 7 13
Upvotes: 37
Reputation: 44340
You can do this in one line with:
cumsum(ifelse(is.na(x), 0, x)) + x*0
# a b c d e f g h
# 1 3 3 NA 7 NA NA 13
Or, similarly:
library(dplyr)
cumsum(coalesce(x, 0)) + x*0
# a b c d e f g h
# 1 3 3 NA 7 NA NA 13
Upvotes: 56
Reputation: 99371
Here's a function I came up from the answers to this question. Thought I'd share it, since it seems to work well so far. It calculates the cumulative FUNC
of x
while ignoring NA
. FUNC
can be any one of sum()
, prod()
, min()
, or max()
, and x
is a numeric vector.
cumSkipNA <- function(x, FUNC)
{
d <- deparse(substitute(FUNC))
funs <- c("max", "min", "prod", "sum")
stopifnot(is.vector(x), is.numeric(x), d %in% funs)
FUNC <- match.fun(paste0("cum", d))
x[!is.na(x)] <- FUNC(x[!is.na(x)])
x
}
set.seed(1)
x <- sample(15, 10, TRUE)
x[c(2,7,5)] <- NA
x
# [1] 4 NA 9 14 NA 14 NA 10 10 1
cumSkipNA(x, sum)
# [1] 4 NA 13 27 NA 41 NA 51 61 62
cumSkipNA(x, prod)
# [1] 4 NA 36 504 NA 7056 NA
# [8] 70560 705600 705600
cumSkipNA(x, min)
# [1] 4 NA 4 4 NA 4 NA 4 4 1
cumSkipNA(x, max)
# [1] 4 NA 9 14 NA 14 NA 14 14 14
Definitely nothing new, but maybe useful to someone.
Upvotes: 12
Reputation: 6479
Do you want something like this:
x2 <- x
x2[!is.na(x)] <- cumsum(x2[!is.na(x)])
x2
[edit] Alternatively, as suggested by a comment above, you can change NA's to 0's -
miss <- is.na(x)
x[miss] <- 0
cs <- cumsum(x)
cs[miss] <- NA
# cs is the requested cumsum
Upvotes: 30