Reputation: 3764
I am trying interpolate splines for the following example data:
trt depth root carbon
A 2 1 14
A 4 2 18
A 6 3 18
A 8 3 17
A 10 1 12
B 2 3 16
B 4 4 18
B 6 4 17
B 8 2 15
B 10 1 12
in the following way:
new_df<-df%>%
group_by(trt)%>%
summarise_each(funs(splinefun(., x=depth, method="natural")))
I get an Error: not a vector
, but I don't see why not. Am I not expressing the function in the right way?
Upvotes: 4
Views: 2379
Reputation: 5287
Do you want a dataset that contains the values interpolated? If so, I've expanded the dataset to contain the desired x locations before the splines are calculated.
The resolution of those points are determined in the second line of the expand.grid()
function. Just make sure the original depth points are a subset of the expanded depth points (eg, don't use something uneven like by=.732
).
library(magrittr)
ds <- readr::read_csv("trt,depth,root,carbon\nA,2,1,14\nA,4,2,18\nA,6,3,18\nA,8,3,17\nA,10,1,12\nB,2,3,16\nB,4,4,18\nB,6,4,17\nB,8,2,15\nB,10,1,12")
ds_depths_possible <- expand.grid(
depth = seq(from=min(ds$depth), max(ds$depth), by=.5), #Decide resolution here.
trt = c("A", "B"),
stringsAsFactors = FALSE
)
ds_intpolated <- ds %>%
dplyr::right_join(ds_depths_possible, by=c("trt", "depth")) %>% #Incorporate locations to interpolate
dplyr::group_by(trt) %>%
dplyr::mutate(
root_interpolated = spline(x=depth, y=root , xout=depth)$y,
carbon_interpolated = spline(x=depth, y=carbon, xout=depth)$y
) %>%
dplyr::ungroup()
ds_intpolated
Output:
Source: local data frame [34 x 6]
trt depth root carbon root_interpolated carbon_interpolated
(chr) (dbl) (int) (int) (dbl) (dbl)
1 A 2.0 1 14 1.000000 14.00000
2 A 2.5 NA NA 1.195312 15.57031
3 A 3.0 NA NA 1.437500 16.72917
4 A 3.5 NA NA 1.710938 17.52344
5 A 4.0 2 18 2.000000 18.00000
6 A 4.5 NA NA 2.289062 18.21094
7 A 5.0 NA NA 2.562500 18.22917
8 A 5.5 NA NA 2.804688 18.13281
9 A 6.0 3 18 3.000000 18.00000
10 A 6.5 NA NA 3.132812 17.88281
.. ... ... ... ... ... ...
In the graphs above, the little points & lines are interpolated. The big fat points are observed.
library(ggplot2)
ggplot(ds_intpolated, aes(x=depth, y=root_interpolated, color=trt)) +
geom_line() +
geom_point(shape=1) +
geom_point(aes(y=root), size=5, alpha=.3, na.rm=T) +
theme_bw()
ggplot(ds_intpolated, aes(x=depth, y=carbon_interpolated, color=trt)) +
geom_line() +
geom_point(shape=1) +
geom_point(aes(y=carbon), size=5, alpha=.3, na.rm=T) +
theme_bw()
If you want an additional example, here's some recent code and slides. We needed a rolling median for some missing points, and linear stats::approx()
for some others. Another option is also stats::loess()
, but it's arguments aren't as similar as approx()
and spline()
.
Upvotes: 6
Reputation: 263301
I gave up trying to get dplyr::summarise_each
(and also tried dplyr::summarise, since your choice of functions didn't seem to match you desire for multiple column input to return only two functions.) I'm not sure it's possible in dply
. Here's what might be called the canonical method of approaching this:
lapply( split(df, df$trt), function(d) splinefun(x=d$depth, y=d$carbon) )
#-------------
$A
function (x, deriv = 0L)
{
deriv <- as.integer(deriv)
if (deriv < 0L || deriv > 3L)
stop("'deriv' must be between 0 and 3")
if (deriv > 0L) {
z0 <- double(z$n)
z[c("y", "b", "c")] <- switch(deriv, list(y = z$b, b = 2 *
z$c, c = 3 * z$d), list(y = 2 * z$c, b = 6 * z$d,
c = z0), list(y = 6 * z$d, b = z0, c = z0))
z[["d"]] <- z0
}
res <- .splinefun(x, z)
if (deriv > 0 && z$method == 2 && any(ind <- x <= z$x[1L]))
res[ind] <- ifelse(deriv == 1, z$y[1L], 0)
res
}
<bytecode: 0x7fe56e4853f8>
<environment: 0x7fe56efd3d80>
$B
function (x, deriv = 0L)
{
deriv <- as.integer(deriv)
if (deriv < 0L || deriv > 3L)
stop("'deriv' must be between 0 and 3")
if (deriv > 0L) {
z0 <- double(z$n)
z[c("y", "b", "c")] <- switch(deriv, list(y = z$b, b = 2 *
z$c, c = 3 * z$d), list(y = 2 * z$c, b = 6 * z$d,
c = z0), list(y = 6 * z$d, b = z0, c = z0))
z[["d"]] <- z0
}
res <- .splinefun(x, z)
if (deriv > 0 && z$method == 2 && any(ind <- x <= z$x[1L]))
res[ind] <- ifelse(deriv == 1, z$y[1L], 0)
res
}
<bytecode: 0x7fe56e4853f8>
<environment: 0x7fe56efc4db8>
Upvotes: 0