Reputation: 326
I would like to calculate the first difference in a variable if either the current value or the lag value is missing. The R diff() function returns NA if either value is missing. Can this behavior be changed?
data <- c(5, NA, NA, 10, 25)
diff_i_want <- c(-5, NA, 10, 15)
diff_i_get <- diff(data)
identical(diff_i_want, diff_i_get)
Upvotes: 1
Views: 5036
Reputation: 6459
you can replace NA
's by zeros:
x <- c(5, NA, NA, 10, 25)
> diff("[<-"(x, is.na(x), 0))
[1] -5 0 10 15
Admittedly, this is different from your diff_i_want
... but I'm not sure of your logic. How do you get -5
as the first element of your answer? Why -5
? The only way to get there is to implicitly replace NA
by zero. So if you do this replacement there, why don't you replace the next element?
Though your desired answer doesn't make much sense to me, it is possible to obtain it e.g. using zoo::rollapply
:
# first define a function that takes a vector of length 2
# ... and will output the difference if no more than 1 of the values is missing
weirddiff <- function(x) {
if(any(is.na(x)) && !all(is.na(x))) x[is.na(x)] <- 0
x[2] - x[1]
}
Now we can use rollapply
with the window set to 2
:
library(zoo)
rollapply(x,2,weirddiff)
[1] -5 NA 10 15
Upvotes: 3
Reputation: 9313
Here is a way:
data <- c(5, NA, NA, 10, 25)
data2 = data
data2[is.na(data2)] = 0
diffData2 = diff(data2)
diffData2[diff(is.na(data))==0 & is.na(data[-1])] = NA
> diffData2
[1] -5 NA 10 15
First make a copy the data to data2, set all NAs to 0 and then diff. At the last step put back all NAs into the calculated diff.
Upvotes: 1