user1491868
user1491868

Reputation: 326

R diff() handling NA

I would like to calculate the first difference in a variable if either the current value or the lag value is missing. The R diff() function returns NA if either value is missing. Can this behavior be changed?

data <- c(5, NA, NA, 10, 25)

diff_i_want <- c(-5, NA, 10, 15)

diff_i_get <- diff(data)

identical(diff_i_want, diff_i_get)

Upvotes: 1

Views: 5036

Answers (2)

lebatsnok
lebatsnok

Reputation: 6459

you can replace NA's by zeros:

x <- c(5, NA, NA, 10, 25)
> diff("[<-"(x, is.na(x), 0))
[1] -5  0 10 15

Admittedly, this is different from your diff_i_want ... but I'm not sure of your logic. How do you get -5 as the first element of your answer? Why -5? The only way to get there is to implicitly replace NA by zero. So if you do this replacement there, why don't you replace the next element?

Though your desired answer doesn't make much sense to me, it is possible to obtain it e.g. using zoo::rollapply:

# first define a function that takes a vector of length 2
# ... and will output the difference if no more than 1 of the values is missing
weirddiff <- function(x) {
  if(any(is.na(x)) && !all(is.na(x))) x[is.na(x)] <- 0
  x[2] - x[1]
}

Now we can use rollapply with the window set to 2:

library(zoo)
rollapply(x,2,weirddiff)
[1] -5 NA 10 15

Upvotes: 3

R. Schifini
R. Schifini

Reputation: 9313

Here is a way:

data <- c(5, NA, NA, 10, 25)
data2 = data
data2[is.na(data2)] = 0
diffData2 = diff(data2)
diffData2[diff(is.na(data))==0 & is.na(data[-1])] = NA

> diffData2
[1] -5 NA 10 15

First make a copy the data to data2, set all NAs to 0 and then diff. At the last step put back all NAs into the calculated diff.

Upvotes: 1

Related Questions