Peter
Peter

Reputation: 7790

Modify function calls within a formula

Suppose you are working on a regression model and at least one of the predictors is estimated via splines, e.g.,

library(splines)
data(diamonds, package = "ggplot2")

fit <- lm(price ~ bs(depth, degree = 5) + bs(carat, knots = c(2, 3)) * color, 
          data = diamonds)

The above fit is for illustrative purposes and has no meaningful reason for being.

Now, lets keep the same basic formula, but change the knot locations for both depth and carat. The update needs to take place in a dynamic way such that it could be part of a bigger MCMC method (number of knots and knot locations to be determined either by either a reversible jump or birth/death step).

I'm well aware of the update and update.formula calls, but I don't believe that these tools will help. The following pseudo code should illustrate the behavior of the function I'm planning to develop.

foo <- function(formula, data) { 

  # Original Model matrix, the formula will be of the form:
  Xmat_orig <- model.matrix(formula, data)

  # some fancy method for selecting new knot locations here
  # lots of cool R code....

  # pseudo code for the 'new knots'.  In the example formula above var1 would be
  # depth and var2 would be carat.  The number of elements in this list would be
  # dependent on the formula passed into foo.
  new_knots <- list(k1 = knot_locations_for_var1, 
                    k2 = knot_locations_for_var2)

  # updated model matrix: 
  # pseudo code for that the new model matrix call would look like.
  Xmat_new <- 
    model.matrix(y ~ bs(var1, degree = 5, knots = new_knots$k1) + bs(var2, knots = new_knots$k2) * color, 
                 data = data)

  return(Xmat_new) 
}

Can someone suggest a way to modify the knots call within either a bs or ns call dynamically?

Upvotes: 1

Views: 1030

Answers (4)

Peter
Peter

Reputation: 7790

EDIT:

Thank you @MrFlick, your solution is exactly what I was looking for.

#original post

Thank you to @MrFlick and @hadley, their responses on SO and Twitter help me to find a working solution. This method will need refinement, but seems to be working for my immediate needs.

The function with_new_knots defined below will parse the a formula and modify elements via terms. (I should also thank the author of the survival package, Terry Therneau, as I dug through that code to see how formulas were manipulated when functions such as strata were included in formulas.) I can already think up use cases where this function would fail, but the important part is that the outline of the method exists and I can extend and improve it later.

library(ggplot2)
library(reshape2)
library(dplyr)
library(magrittr)
library(splines)
set.seed(42)

with_new_knots <- function(frm, data, iterations = 5L) { 
  # extract the original formula
  old_terms   <- terms(frm, specials = c("bs", "ns"))

  # reconstruct the rhs of the formula with any interaction terms expanded
  cln     <- colnames(attr(old_terms, "factors")) 
  old_rhs <- paste(cln, collapse = " + ")

  # Extract the spline terms from the old_formula 
  idx              <- attr(old_terms, "specials") %>% unlist   %>% sort
  old_spline_terms <- attr(old_terms, "factors")  %>% rownames %>% extract(idx)

  # grab the variable names which splines are built on
  vars <- all.vars(frm)[idx]

  # define the range for each variable in vars
  rngs <- lapply(vars, function(x) { range(data[, x]) })

  # for each of the spline terms, randomly generate new knots
  # This is a silly example, something clever will replace it. 

  out <- replicate(iterations, 
                   {
                     new_knots <- lapply(rngs, function(r) { 
                                         kts <- sort(runif(sample(1:5, 1), min = r[1], max = r[2]))
                                         paste0("c(", paste(kts, collapse = ", "), ")")
                             })

                     new_spline_terms <- 
                       mapply(FUN = function(s, k) { sub(")$", paste0(", knots = ", k, ")"), s) },
                              s = old_spline_terms,
                              k = new_knots)

                     rhs <- old_rhs
                     for(i in 1:length(old_spline_terms)) { 
                       rhs <- gsub(old_spline_terms[i], new_spline_terms[i], rhs, fixed = TRUE)
                     }

                     f <- as.formula(paste(rownames(attr(old_terms, "factors"))[1], "~", rhs))
                     environment(f) <- environment(frm)
                     return(f)
                   }, 
                   simplify = FALSE) 
  return(out) 
}

example use:

Here a statistically meaningless model is presented and modified via with_new_knots to illustrate the results, one formula object is updated so that the spline calls within the formula have been updated.

f <- price ~ ns(carat) * color + bs(depth, degree = 5) + clarity
with_new_knots(f, diamonds)


orig_fit <- predict(lm(f, data = diamonds))
new_fits <- with_new_knots(f, diamonds) %>%
            lapply(., function(frm) { predict(lm(frm, data = diamonds)) })

dat <- data.frame(orig_fit, new_fits)
names(dat)[2:6] <- paste("new knots", 1:5)
dat <- melt(dat, id.vars = NULL)
dat <- cbind(dat, diamonds)

ggplot(dat) + 
aes(x = carat, y = value, color = color, shape = clarity) + 
geom_line() + 
geom_point(aes(y = price), alpha = 0.1) + 
facet_wrap( ~ variable, scale = "free")

Illustration of the different models with different knots

Upvotes: 0

MrFlick
MrFlick

Reputation: 206566

Here's another possibility that isn't as picky about what is input to your function. Consider this

newknots <- function(form, data, calls=c("bs","ns")) {
    nk <- function(x) { 
        sort(runif(sample(1:5, 1), min = min(data[[x]]), max = max(data[[x]])))
    }
    rr <- function(x, nk, calls) {
        if(is.call(x) && deparse(x[[1]]) %in% calls) {
            x$knots = nk(deparse(x[[2]]))
            x
        } else if (is.recursive(x)) {
            as.call(lapply(as.list(x), rr, nk, calls))
        } else {
            x
        }
    }
    z <- lapply(as.list(form), rr, nk, calls)   
    z <- eval(as.call(z))
    environment(z) <- environment(form)
    z
}

So this isn't exactly a trivial function, but hopefully it's not too bad. The idea is that we can convert a formula to a list object that we can recursively investigate. That's what the internal rr function is doing. It takes a list, and then looks at each of the elements. It looks for calls to bs or ns and when it finds them, it replaces the knots= parameter.

Here we use the kn function to create a new set of knots for a given variable name which is passed in as a string. We simply need to return a list of values appropriate for that variable.

Finally I need to turn the list back into a formula and I make sure our new object has the same environment as the original formula. So this actually does create a new formula object leaving the original intact (you may replace the original value if you wish).

Here's an example of how you would call/use this function.

f <- price ~ ns(carat, knots=c(1,3)) * color + bs(depth, degree = 5) + clarity
newknots(f, diamonds)

# price ~ ns(carat, knots = c(2.09726121873362, 3.94607368792873
# )) * color + bs(depth, degree = 5, knots = c(44.047089480795, 
# 47.8856966942549, 49.7632855847478, 70.9297015387565)) + clarity

So you can see that the knots were added and replaced according to our new rule. I'm not sure what other features you may need, but hopefully this will give you a good starting point.

Upvotes: 1

Matthew Plourde
Matthew Plourde

Reputation: 44624

Formulas are all bound to an environment. So one option is to create the formula separately with variables for the arguments you might want to change and assign these variables values in the environment of the function.

f <- as.formula("price ~ bs(depth, knots=d_knots) + bs(carat, knots=c_knots) * color", 
                list2env(list(d_knots=c(2,3), c_knots=c(3,2))))

I've defined two defaults for d_knots and c_knots. To then modify these values:

environment(f)$d_knots <- c(2,3)
environment(f)$c_knots <- c(3, 2)

You can then supply the formula to the modeling function

fit <- lm(f, data=diamonds)

Upvotes: 1

Xin Yin
Xin Yin

Reputation: 2936

You can use substitute function in R, which:

substitute(expr, env) substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.

For example:

> rm(list=ls())
> x <- 1
> x + y
Error: object 'y' not found

Because y is not defined. Now use substitute:

> (expr <- substitute(x + y, list(y=2)))
x + 2
> eval(expr)
[1] 3
> z <- 2
> (expr <- substitute(x + y, list(y=z)))
x + 2
> eval(expr)
[1] 3

In your example:

f1 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color, 
                       list(deg=5, knts=c(2, 3))))
f2 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color,
                       list(deg=6, knts=c(3, 4))))

fit1 <- lm(f1, data=diamonds)
fit2 <- lm(f2, data=diamonds)

In general, you can write a function that wraps the substitute call, say:

formula.with.knots <- function(degree, knots) {
  f.expr <- substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color, 
                        list(deg=degree, knts=knots))

  eval(f.expr)
}

f <- formula.with.knots(5, c(2, 3))
fit <- lm(f, data = diamonds)
summary(fit)

Upvotes: 1

Related Questions