Nazer
Nazer

Reputation: 3764

Interpolating a spline within dplyr

I am trying interpolate splines for the following example data:

trt    depth    root    carbon
A       2        1        14
A       4        2        18
A       6        3        18
A       8        3        17
A      10        1        12
B       2        3        16
B       4        4        18
B       6        4        17
B       8        2        15
B      10        1        12

in the following way:

new_df<-df%>%
  group_by(trt)%>%
  summarise_each(funs(splinefun(., x=depth, method="natural")))

I get an Error: not a vector, but I don't see why not. Am I not expressing the function in the right way?

Upvotes: 4

Views: 2379

Answers (2)

wibeasley
wibeasley

Reputation: 5287

Do you want a dataset that contains the values interpolated? If so, I've expanded the dataset to contain the desired x locations before the splines are calculated.

The resolution of those points are determined in the second line of the expand.grid() function. Just make sure the original depth points are a subset of the expanded depth points (eg, don't use something uneven like by=.732).

library(magrittr)
ds <- readr::read_csv("trt,depth,root,carbon\nA,2,1,14\nA,4,2,18\nA,6,3,18\nA,8,3,17\nA,10,1,12\nB,2,3,16\nB,4,4,18\nB,6,4,17\nB,8,2,15\nB,10,1,12")

ds_depths_possible <- expand.grid(
  depth            = seq(from=min(ds$depth), max(ds$depth), by=.5), #Decide resolution here.
  trt              = c("A", "B"),
  stringsAsFactors = FALSE
  )

ds_intpolated <- ds %>% 
  dplyr::right_join(ds_depths_possible, by=c("trt", "depth")) %>% #Incorporate locations to interpolate
  dplyr::group_by(trt) %>% 
  dplyr::mutate(
    root_interpolated     = spline(x=depth, y=root  , xout=depth)$y,
    carbon_interpolated   = spline(x=depth, y=carbon, xout=depth)$y
  ) %>% 
  dplyr::ungroup()
ds_intpolated

Output:

Source: local data frame [34 x 6]

     trt depth  root carbon root_interpolated carbon_interpolated
   (chr) (dbl) (int)  (int)             (dbl)               (dbl)
1      A   2.0     1     14          1.000000            14.00000
2      A   2.5    NA     NA          1.195312            15.57031
3      A   3.0    NA     NA          1.437500            16.72917
4      A   3.5    NA     NA          1.710938            17.52344
5      A   4.0     2     18          2.000000            18.00000
6      A   4.5    NA     NA          2.289062            18.21094
7      A   5.0    NA     NA          2.562500            18.22917
8      A   5.5    NA     NA          2.804688            18.13281
9      A   6.0     3     18          3.000000            18.00000
10     A   6.5    NA     NA          3.132812            17.88281
..   ...   ...   ...    ...               ...                 ...

root carbon

In the graphs above, the little points & lines are interpolated. The big fat points are observed.

library(ggplot2)
ggplot(ds_intpolated, aes(x=depth, y=root_interpolated, color=trt)) +
  geom_line() +
  geom_point(shape=1) +
  geom_point(aes(y=root), size=5, alpha=.3, na.rm=T) +
  theme_bw()

ggplot(ds_intpolated, aes(x=depth, y=carbon_interpolated, color=trt)) +
  geom_line() +
  geom_point(shape=1) +
  geom_point(aes(y=carbon), size=5, alpha=.3, na.rm=T) +
  theme_bw()

If you want an additional example, here's some recent code and slides. We needed a rolling median for some missing points, and linear stats::approx() for some others. Another option is also stats::loess(), but it's arguments aren't as similar as approx() and spline().

Upvotes: 6

IRTFM
IRTFM

Reputation: 263301

I gave up trying to get dplyr::summarise_each (and also tried dplyr::summarise, since your choice of functions didn't seem to match you desire for multiple column input to return only two functions.) I'm not sure it's possible in dply. Here's what might be called the canonical method of approaching this:

 lapply( split(df, df$trt), function(d) splinefun(x=d$depth, y=d$carbon) )
#-------------

$A
function (x, deriv = 0L) 
{
    deriv <- as.integer(deriv)
    if (deriv < 0L || deriv > 3L) 
        stop("'deriv' must be between 0 and 3")
    if (deriv > 0L) {
        z0 <- double(z$n)
        z[c("y", "b", "c")] <- switch(deriv, list(y = z$b, b = 2 * 
            z$c, c = 3 * z$d), list(y = 2 * z$c, b = 6 * z$d, 
            c = z0), list(y = 6 * z$d, b = z0, c = z0))
        z[["d"]] <- z0
    }
    res <- .splinefun(x, z)
    if (deriv > 0 && z$method == 2 && any(ind <- x <= z$x[1L])) 
        res[ind] <- ifelse(deriv == 1, z$y[1L], 0)
    res
}
<bytecode: 0x7fe56e4853f8>
<environment: 0x7fe56efd3d80>

$B
function (x, deriv = 0L) 
{
    deriv <- as.integer(deriv)
    if (deriv < 0L || deriv > 3L) 
        stop("'deriv' must be between 0 and 3")
    if (deriv > 0L) {
        z0 <- double(z$n)
        z[c("y", "b", "c")] <- switch(deriv, list(y = z$b, b = 2 * 
            z$c, c = 3 * z$d), list(y = 2 * z$c, b = 6 * z$d, 
            c = z0), list(y = 6 * z$d, b = z0, c = z0))
        z[["d"]] <- z0
    }
    res <- .splinefun(x, z)
    if (deriv > 0 && z$method == 2 && any(ind <- x <= z$x[1L])) 
        res[ind] <- ifelse(deriv == 1, z$y[1L], 0)
    res
}
<bytecode: 0x7fe56e4853f8>
<environment: 0x7fe56efc4db8>

Upvotes: 0

Related Questions