Reputation: 1845
I have a vector of numbers with NA
s and I want to create a function that will linearly interpolate the NA s given the number before and after the NA
s... sometime the function will need to generate one number, others more than that
x <- c(1, NA, NA, 4, 5, 6, NA, 7, 8, NA, NA, NA, NA, 13, 14, 15)
first step is this even what you'd use to linearly interpolate the first two NA
s between 1
and 4
approx(x = c(1,4), n = 2, method="linear")$x
n
in the static example is the number of NA
s to be filled and a, b
are the values on either side of the NA?interpolate <- function(a, b, n) {
approx(x = c(a, b), n = n, method="linear")$x
}
IF this is even the right approach, how do I put this all together and apply this to the entire vector? How would others approach this problem? Any help appreciated!!!
Upvotes: 1
Views: 751
Reputation: 1488
Please find my solution written as the following four functions:
linear_int_NA <- function(vec)
{
BM_list = Reduce(f = function(x, y) bitmagic(x), x = 0:(length(vec)-2), init = is.na(vec), accumulate = TRUE)
names(BM_list) = 1:(length(BM_list))
BM_proc = Filter(length, lapply(rev(BM_list), function(x) which(x)))
blocks = readBM(BM_proc)
replace_x_lint(vec, blocks)
}
bitmagic <- function(bin)
{
bin[-length(bin)] & bin[-1]
}
readBM <- function(BM_proc)
{
blocks = matrix(NA, nrow = 0, ncol = 2); i = 1
while(i <= length(BM_proc))
{
if(length(BM_proc[[i]]) > 0)
{
row = c(BM_proc[[i]], BM_proc[[i]] + as.numeric(names(BM_proc)[i]) - 1)
blocks = rbind(blocks, row)
pos = sapply(BM_proc, function(x) !(x%in% (row[1]):row[2]))
Names = names(BM_proc)
BM_proc = lapply(1:length(BM_proc), function(x) BM_proc[[x]][pos[[x]]])
names(BM_proc) = Names
BM_proc = Filter(length, BM_proc)
}
}
return(blocks)
}
replace_x_lint <- function(vec, blocks)
{
l_int = lapply(1:nrow(blocks), function(x) approx(x = vec[c(blocks[x,1] - 1, blocks[x,2] + 1)],
n = blocks[x,2] - blocks[x,1] + 3, method="linear")$y[-c(1, blocks[x,2] - blocks[x,1] + 3)])
for(i in 1:nrow(blocks))
{
vec[blocks[i,1]:blocks[i,2]] <- l_int[[i]]
}
return(vec)
}
To see them in action, for example:
vec <- c(1, NA, NA, 4, 5, 6, NA, 7, 8, NA, NA, NA, NA, 13, 14, 15)
vec # 1 NA NA 4 5 6 NA 7 8 NA NA NA NA 13 14 15
linear_int_NA(vec) # 1.0 2.0 3.0 4.0 5.0 6.0 6.5 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0
This looks like the following:
plot(1:length(vec), linear_int_NA(vec), col = as.factor(is.na(vec)), pch = 19)
The logic behind is:
Also, bare in mind that this function will fail if the first or last element of the vector is an NA (you cannot apply linear interpolation without a defined range).
Please cite if you employ the function in an academic environment. I am very happy to answer any questions you have about the code. Best,
Ventrilocus.
Upvotes: 2