timothy.s.lau
timothy.s.lau

Reputation: 1101

Interpolating when there's NA's

I have longitudinal data structured (in R) as so:

dat1 <- data.frame(county = c("a","a","a", "b","b","b", "c","c","c"), year = c(2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003), count = c(2, NA, 6, 4, NA, 8, 10, NA, 14))

I have a missing year I'd like to interpolate 2002. Which I made a function to do:

dat1[dat1$year == 2002,] <- apply(X = data.frame("2001" = dat1[dat1$year == 2001, "count"], "2003" = dat1[dat1$year == 2003, "count"]), MARGIN = 1, FUN = function(x){z <- data.frame(approx(x = x, method = "linear", n = length(2001:2003))$y); return(z[-c(1, nrow(z)),])})
dat1 # the desired result

But it only works when there isn't any NA's in the two years that I use to interpolate:

dat2 <- dat1
dat2[9,3] <- NA

apply(X = data.frame("2001" = dat2[dat2$year == 2001, "count"], "2003" = dat2[dat2$year == 2003, "count"]), MARGIN = 1, FUN = function(x){z <- data.frame(approx(x = x, method = "linear", n = length(2001:2003))$y); return(z[-c(1, nrow(z)),])})

I'm aware that I could just use complete.cases() to make it run but then I can't insert it back into the original data frame because the row lengths would differ.

Does anyone have a solution to my NA's in the preceding or following years problem or better solution to my kludge-tastic attempt?

Upvotes: 0

Views: 107

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269491

Return NA if there are only NAs, replace every NA with the single non-NA using na.aggregate if there is only one non-NA and use na.approx if there is more than one NA in a county.

library(zoo)

na_approx <- function(x) {
    if (all(is.na(x))) NA 
    else if (sum(!is.na(x)) == 1) na.aggregate(x)
    else na.approx(x, na.rm = FALSE)
}
transform(dat2, count = ave(count, county, FUN = na_approx))

giving:

  county year count
1      a 2001     2
2      a 2002     4
3      a 2003     6
4      b 2001     4
5      b 2002     6
6      b 2003     8
7      c 2001    10
8      c 2002    10
9      c 2003    10

Here is an alternative that can have slightly different behavior although in this example it gives the same result.

na_approx2 <- function(x) {
     if (sum(!is.na(x)) > 1) na.approx(x, na.rm = FALSE) 
     else na.locf(x, na.rm = FALSE)
}
transform(dat2, count = ave(count, county, FUN = na_approx2))

Upvotes: 2

Related Questions