Reputation: 7790
Suppose you are working on a regression model and at least one of the predictors is estimated via splines, e.g.,
library(splines)
data(diamonds, package = "ggplot2")
fit <- lm(price ~ bs(depth, degree = 5) + bs(carat, knots = c(2, 3)) * color,
data = diamonds)
The above fit is for illustrative purposes and has no meaningful reason for being.
Now, lets keep the same basic formula, but change the knot locations for both depth and carat. The update needs to take place in a dynamic way such that it could be part of a bigger MCMC method (number of knots and knot locations to be determined either by either a reversible jump or birth/death step).
I'm well aware of the update
and update.formula
calls, but I don't believe
that these tools will help. The following pseudo code should illustrate the
behavior of the function I'm planning to develop.
foo <- function(formula, data) {
# Original Model matrix, the formula will be of the form:
Xmat_orig <- model.matrix(formula, data)
# some fancy method for selecting new knot locations here
# lots of cool R code....
# pseudo code for the 'new knots'. In the example formula above var1 would be
# depth and var2 would be carat. The number of elements in this list would be
# dependent on the formula passed into foo.
new_knots <- list(k1 = knot_locations_for_var1,
k2 = knot_locations_for_var2)
# updated model matrix:
# pseudo code for that the new model matrix call would look like.
Xmat_new <-
model.matrix(y ~ bs(var1, degree = 5, knots = new_knots$k1) + bs(var2, knots = new_knots$k2) * color,
data = data)
return(Xmat_new)
}
Can someone suggest a way to modify the knots
call within either a bs
or
ns
call dynamically?
Upvotes: 1
Views: 1030
Reputation: 7790
Thank you @MrFlick, your solution is exactly what I was looking for.
Thank you to @MrFlick and @hadley, their responses on SO and Twitter help me to find a working solution. This method will need refinement, but seems to be working for my immediate needs.
The function with_new_knots
defined below will parse the a formula
and modify elements via terms
. (I should also thank the author of the survival
package, Terry Therneau, as I dug through that code to see how formulas were manipulated when functions such as strata
were included in formulas.) I can already think up use cases where this function would fail, but the important part is that the outline of the method exists and I can extend and improve it later.
library(ggplot2)
library(reshape2)
library(dplyr)
library(magrittr)
library(splines)
set.seed(42)
with_new_knots <- function(frm, data, iterations = 5L) {
# extract the original formula
old_terms <- terms(frm, specials = c("bs", "ns"))
# reconstruct the rhs of the formula with any interaction terms expanded
cln <- colnames(attr(old_terms, "factors"))
old_rhs <- paste(cln, collapse = " + ")
# Extract the spline terms from the old_formula
idx <- attr(old_terms, "specials") %>% unlist %>% sort
old_spline_terms <- attr(old_terms, "factors") %>% rownames %>% extract(idx)
# grab the variable names which splines are built on
vars <- all.vars(frm)[idx]
# define the range for each variable in vars
rngs <- lapply(vars, function(x) { range(data[, x]) })
# for each of the spline terms, randomly generate new knots
# This is a silly example, something clever will replace it.
out <- replicate(iterations,
{
new_knots <- lapply(rngs, function(r) {
kts <- sort(runif(sample(1:5, 1), min = r[1], max = r[2]))
paste0("c(", paste(kts, collapse = ", "), ")")
})
new_spline_terms <-
mapply(FUN = function(s, k) { sub(")$", paste0(", knots = ", k, ")"), s) },
s = old_spline_terms,
k = new_knots)
rhs <- old_rhs
for(i in 1:length(old_spline_terms)) {
rhs <- gsub(old_spline_terms[i], new_spline_terms[i], rhs, fixed = TRUE)
}
f <- as.formula(paste(rownames(attr(old_terms, "factors"))[1], "~", rhs))
environment(f) <- environment(frm)
return(f)
},
simplify = FALSE)
return(out)
}
Here a statistically meaningless model is presented and modified via with_new_knots
to illustrate the results, one formula
object is updated so that the spline
calls within the formula have been updated.
f <- price ~ ns(carat) * color + bs(depth, degree = 5) + clarity
with_new_knots(f, diamonds)
orig_fit <- predict(lm(f, data = diamonds))
new_fits <- with_new_knots(f, diamonds) %>%
lapply(., function(frm) { predict(lm(frm, data = diamonds)) })
dat <- data.frame(orig_fit, new_fits)
names(dat)[2:6] <- paste("new knots", 1:5)
dat <- melt(dat, id.vars = NULL)
dat <- cbind(dat, diamonds)
ggplot(dat) +
aes(x = carat, y = value, color = color, shape = clarity) +
geom_line() +
geom_point(aes(y = price), alpha = 0.1) +
facet_wrap( ~ variable, scale = "free")
Upvotes: 0
Reputation: 206566
Here's another possibility that isn't as picky about what is input to your function. Consider this
newknots <- function(form, data, calls=c("bs","ns")) {
nk <- function(x) {
sort(runif(sample(1:5, 1), min = min(data[[x]]), max = max(data[[x]])))
}
rr <- function(x, nk, calls) {
if(is.call(x) && deparse(x[[1]]) %in% calls) {
x$knots = nk(deparse(x[[2]]))
x
} else if (is.recursive(x)) {
as.call(lapply(as.list(x), rr, nk, calls))
} else {
x
}
}
z <- lapply(as.list(form), rr, nk, calls)
z <- eval(as.call(z))
environment(z) <- environment(form)
z
}
So this isn't exactly a trivial function, but hopefully it's not too bad. The idea is that we can convert a formula to a list object that we can recursively investigate. That's what the internal rr
function is doing. It takes a list, and then looks at each of the elements. It looks for calls to bs
or ns
and when it finds them, it replaces the knots=
parameter.
Here we use the kn
function to create a new set of knots for a given variable name which is passed in as a string. We simply need to return a list of values appropriate for that variable.
Finally I need to turn the list back into a formula and I make sure our new object has the same environment as the original formula. So this actually does create a new formula object leaving the original intact (you may replace the original value if you wish).
Here's an example of how you would call/use this function.
f <- price ~ ns(carat, knots=c(1,3)) * color + bs(depth, degree = 5) + clarity
newknots(f, diamonds)
# price ~ ns(carat, knots = c(2.09726121873362, 3.94607368792873
# )) * color + bs(depth, degree = 5, knots = c(44.047089480795,
# 47.8856966942549, 49.7632855847478, 70.9297015387565)) + clarity
So you can see that the knots were added and replaced according to our new rule. I'm not sure what other features you may need, but hopefully this will give you a good starting point.
Upvotes: 1
Reputation: 44624
Formulas are all bound to an environment. So one option is to create the formula separately with variables for the arguments you might want to change and assign these variables values in the environment of the function.
f <- as.formula("price ~ bs(depth, knots=d_knots) + bs(carat, knots=c_knots) * color",
list2env(list(d_knots=c(2,3), c_knots=c(3,2))))
I've defined two defaults for d_knots
and c_knots
. To then modify these values:
environment(f)$d_knots <- c(2,3)
environment(f)$c_knots <- c(3, 2)
You can then supply the formula to the modeling function
fit <- lm(f, data=diamonds)
Upvotes: 1
Reputation: 2936
You can use substitute
function in R, which:
substitute(expr, env) substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.
For example:
> rm(list=ls())
> x <- 1
> x + y
Error: object 'y' not found
Because y
is not defined. Now use substitute
:
> (expr <- substitute(x + y, list(y=2)))
x + 2
> eval(expr)
[1] 3
> z <- 2
> (expr <- substitute(x + y, list(y=z)))
x + 2
> eval(expr)
[1] 3
In your example:
f1 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color,
list(deg=5, knts=c(2, 3))))
f2 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color,
list(deg=6, knts=c(3, 4))))
fit1 <- lm(f1, data=diamonds)
fit2 <- lm(f2, data=diamonds)
In general, you can write a function that wraps the substitute
call, say:
formula.with.knots <- function(degree, knots) {
f.expr <- substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color,
list(deg=degree, knts=knots))
eval(f.expr)
}
f <- formula.with.knots(5, c(2, 3))
fit <- lm(f, data = diamonds)
summary(fit)
Upvotes: 1