D Kincaid
D Kincaid

Reputation: 281

How do I limit x-range of spline() interpolation to first and last non-NA value in dplyr?

I want to interpolate missing values using dplyr, piping, and spline().

Data:

test <- structure(list(site = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("lake", "stream", "wetland"
    ), class = "factor"), depth = c(0L, -3L, -4L, -8L, -10L, -14L, 
    0L, -1L, -3L, -5L, 0L, -2L, -4L, -6L), var1 = c(NA, 1L, 3L, NA, 
    6L, NA, 1L, 2L, NA, 4L, 1L, NA, NA, 4L), var2 = c(1L, NA, 3L, 
    4L, 8L, NA, NA, NA, NA, NA, NA, 2L, NA, NA)), .Names = c("site", 
    "depth", "var1", "var2"), class = "data.frame", row.names = c(NA, 
    -14L))

Q1: How do I use the following functioning code, but limit the range of interpolation to occur between the first non-NA value and the last non-NA value for each variable. For example, it should only interpolate var1 for wetland at depth -8 and return NA for depths 0 and -14.

library(tidyverse)

test_int <- test %>% 
    group_by(site) %>% 
    mutate_at(vars(c(var1, var2)),
              funs("i" = if(sum(!is.na(.)) > 1) 
                             spline(x=depth, y=., xout=depth)[["y"]]
                         else
                             NA)) 

Q2: Is there a way to bound my interpolated values from 0 to Inf? Or is this not appropriate with spline (e.g., I should use another interpolation method such as smooth or loess)?

Upvotes: 3

Views: 571

Answers (1)

leerssej
leerssej

Reputation: 14988

Not pretty, but capable of filtering out the excess values. Side effect is that it filters out interpolated values beyond the min and max limits as well.

test_clean <- 
    test %>% 
    group_by(site) %>% 
    mutate_at(vars(c(var1, var2)),
              funs(c("c" = if(sum(!is.na(.)) > 1) 
                            spline(x=depth, y=., xout=depth)[["y"]]
                        else NA),
                    "min" = min(., na.rm = TRUE),
                    "max" = max(., na.rm = TRUE)
                   )
              ) %>% 
    mutate(var1_i = if_else(var1_c >= var1_min & var1_c <= var1_max, var1_c, NA_real_),
           var2_i = if_else(var2_c >= var2_min & var2_c <= var2_max, var2_c, NA_real_)) %>% 
    select(site:var2, ends_with("i"))

test_clean
# A tibble: 14 x 6
# Groups:   site [3]
      site depth  var1  var2 var1_i   var2_i
    <fctr> <int> <int> <int>  <dbl>    <dbl>
 1 wetland     0    NA     1     NA 1.000000
 2 wetland    -3     1    NA    1.0 3.078125
 3 wetland    -4     3     3    3.0 3.000000
 4 wetland    -8    NA     4     NA 4.000000
 5 wetland   -10     6     8    6.0 8.000000
 6 wetland   -14    NA    NA     NA       NA
 7    lake     0     1    NA    1.0       NA
 8    lake    -1     2    NA    2.0       NA
 9    lake    -3    NA    NA    3.4       NA
10    lake    -5     4    NA    4.0       NA
11  stream     0     1    NA    1.0       NA
12  stream    -2    NA     2    2.0       NA
13  stream    -4    NA    NA    3.0       NA
14  stream    -6     4    NA    4.0       NA

and to help everyone working on improving this or proofing the steps that took place on the way to the final dataframe, here's the dataframe with the intermediate steps included:

# A tibble: 14 x 12
# Groups:   site [3]
      site depth  var1  var2     var1_c    var2_c var1_min var2_min var1_max var2_max var1_i   var2_i
    <fctr> <int> <int> <int>      <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>  <dbl>    <dbl>
 1 wetland     0    NA     1 -7.5714286  1.000000        1        1        6        8     NA 1.000000
 2 wetland    -3     1    NA  1.0000000  3.078125        1        1        6        8    1.0 3.078125
 3 wetland    -4     3     3  3.0000000  3.000000        1        1        6        8    3.0 3.000000
 4 wetland    -8    NA     4  6.7142857  4.000000        1        1        6        8     NA 4.000000
 5 wetland   -10     6     8  6.0000000  8.000000        1        1        6        8    6.0 8.000000
 6 wetland   -14    NA    NA -0.5714286 30.750000        1        1        6        8     NA       NA
 7    lake     0     1    NA  1.0000000        NA        1      Inf        4     -Inf    1.0       NA
 8    lake    -1     2    NA  2.0000000        NA        1      Inf        4     -Inf    2.0       NA
 9    lake    -3    NA    NA  3.4000000        NA        1      Inf        4     -Inf    3.4       NA
10    lake    -5     4    NA  4.0000000        NA        1      Inf        4     -Inf    4.0       NA
11  stream     0     1    NA  1.0000000        NA        1        2        4        2    1.0       NA
12  stream    -2    NA     2  2.0000000        NA        1        2        4        2    2.0       NA
13  stream    -4    NA    NA  3.0000000        NA        1        2        4        2    3.0       NA
14  stream    -6     4    NA  4.0000000        NA        1        2        4        2    4.0       NA

Upvotes: 1

Related Questions