MayaGans
MayaGans

Reputation: 1845

Linear Interpolation over a vector in R

I have a vector of numbers with NAs and I want to create a function that will linearly interpolate the NA s given the number before and after the NAs... sometime the function will need to generate one number, others more than that

x <- c(1, NA, NA, 4, 5, 6, NA, 7, 8, NA, NA, NA, NA, 13, 14, 15)

first step is this even what you'd use to linearly interpolate the first two NAs between 1 and 4

approx(x = c(1,4), n = 2, method="linear")$x
  1. once you have a function you can apply it to the entire vector where n in the static example is the number of NAs to be filled and a, b are the values on either side of the NA?
interpolate <- function(a, b, n) {
  approx(x = c(a, b), n = n, method="linear")$x
}

IF this is even the right approach, how do I put this all together and apply this to the entire vector? How would others approach this problem? Any help appreciated!!!

Upvotes: 1

Views: 751

Answers (1)

Ventrilocus
Ventrilocus

Reputation: 1488

Please find my solution written as the following four functions:

linear_int_NA <- function(vec)
{
  BM_list = Reduce(f = function(x, y) bitmagic(x), x = 0:(length(vec)-2), init = is.na(vec), accumulate = TRUE)
  names(BM_list) = 1:(length(BM_list))
  BM_proc = Filter(length, lapply(rev(BM_list), function(x) which(x)))
  blocks = readBM(BM_proc)
  replace_x_lint(vec, blocks)
}

bitmagic <- function(bin)
{
  bin[-length(bin)] & bin[-1]
}

readBM <- function(BM_proc)
{
  blocks = matrix(NA, nrow = 0, ncol = 2); i = 1
  while(i <= length(BM_proc))
  {
    if(length(BM_proc[[i]]) > 0)
    {
      row = c(BM_proc[[i]], BM_proc[[i]] + as.numeric(names(BM_proc)[i]) - 1)
      blocks = rbind(blocks, row)
      pos = sapply(BM_proc, function(x) !(x%in% (row[1]):row[2]))
      Names = names(BM_proc)
      BM_proc = lapply(1:length(BM_proc), function(x) BM_proc[[x]][pos[[x]]])
      names(BM_proc) = Names
      BM_proc = Filter(length, BM_proc)
    }
  }
  return(blocks)
}

replace_x_lint <- function(vec, blocks)
{
  
  l_int = lapply(1:nrow(blocks), function(x) approx(x = vec[c(blocks[x,1] - 1, blocks[x,2] + 1)], 
                                                    n = blocks[x,2] - blocks[x,1] + 3, method="linear")$y[-c(1, blocks[x,2] - blocks[x,1] + 3)])
  for(i in 1:nrow(blocks))
  {
    vec[blocks[i,1]:blocks[i,2]] <- l_int[[i]]
  }
  return(vec)
}

To see them in action, for example:

vec <- c(1, NA, NA, 4, 5, 6, NA, 7, 8, NA, NA, NA, NA, 13, 14, 15)
vec                # 1    NA   NA   4    5    6    NA   7    8    NA  NA   NA   NA   13   14   15
linear_int_NA(vec) # 1.0  2.0  3.0  4.0  5.0  6.0  6.5  7.0  8.0  9.0 10.0 11.0 12.0 13.0 14.0 15.0

This looks like the following:

plot(1:length(vec), linear_int_NA(vec), col = as.factor(is.na(vec)), pch = 19)

Output visualization

The logic behind is:

  1. is.na(vec), to convert into a binary chain.
  2. apply bitmagic recursively; from the output list (and being smart about it), it is possible to find the positions for the longest chains of 1s which correspond to the longest chains of NAs (so-called blocks) .
  3. Having identified the blocks, linear interpolation is then applied to fill in the NAs.

Also, bare in mind that this function will fail if the first or last element of the vector is an NA (you cannot apply linear interpolation without a defined range).

Please cite if you employ the function in an academic environment. I am very happy to answer any questions you have about the code. Best,

Ventrilocus.

Upvotes: 2

Related Questions