Dan Chaltiel
Dan Chaltiel

Reputation: 8494

How to use quosure-like syntax in base R formulas?

I'd like to write a function like this:

library(survival)
getFit = function(x, data){
  survfit(Surv(start, stop, event) ~ x, data = data)
}
getFit(surgery, heart)
getFit("surgery", heart) #if not possible, this would be fine too

Of course, the x is not read. Please note that I use survfit with heart as an example but I encoutered this problem for nearly every formula based function (lm , glm etc).

I know I could write something with paste and as.formula, but I wondered if there was something like I'd do in a tidyverse way, something like:

getSurvPlot = function(x, data=db){
  xx = enquo(x)
  survfit(Surv(start, stop, event) ~ !!xx, data = data)
}

This last code doesn't work either, but I think this is because survfit is not part of the tidyverse.

Is there any clean way to write something like this in base R ?

EDIT : In this very example, I'd now be using survminer::surv_fit, which is a wrapper around survfit allowing more flexibility in formulas.

Upvotes: 2

Views: 623

Answers (3)

Artem Sokolov
Artem Sokolov

Reputation: 13691

Moody_mudskipper gave a very nice detailed answer. I just wanted to note that your definition of getSurvPlot is almost correct. Your issue is not with rlang/tidyverse, but in using the quoted argument (which is a formula) inside another formula.

When calling getSurvPlot(surgery, heart), enquo will capture the first argument as ~surgery, which is already a formula. Rather than using ~ to create a new formula from xx and Surv, you only need to update the left-hand side of the formula you already have. This can be done using stats::update() from base R:

getSurvPlot <- function(x, data=db){
  xx <- enquo(x)
  survfit(stats::update( xx, Surv(start, stop, event) ~ . ), data = data)
}

getSurvPlot(surgery, heart) should now work as expected.

As pointed out by @Moody_mudskipper, the actual work is done by stats::update.formula(), which is an implementation of the S3 generic stats::update() for formula objects, such as xx.

Upvotes: 2

moodymudskipper
moodymudskipper

Reputation: 47320

Here are 3 options, the 1st is pure base R and the 2 next ones use rlang.

base R

getFit1 = function(x, data){
  survfit(eval(substitute(Surv(start, stop, event) ~ x)), data = data)
}

rlang

There is no reason here to use enquo as it builds an object that contains the parent environment, in your example the object surgery doesn't exist in your global environment, it just needs to be evaluated in the context of the formula. So substitute is the appropriate function here as well.

to be able to use the !! argument we need a function that supports quasi quotated arguments, and then we need to evaluate it or convert it as formula (I don't know of a function that does both in one step).

So we end up with something that doesn't look better than the base version.

getFit2 = function(x, data){
  xx <- substitute(x)
  survfit(eval(expr(Surv(start, stop, event) ~ !!xx)), data = data)
}

rlang again, using new_formula

We can build a formula from its lhs and rhs, but now we need to quote the lhs and we still need expr on the rhs, so base solution still seems better suited for this case.

getFit3 = function(x, data){
  xx <- substitute(x)
  survfit(new_formula(quote(Surv(start, stop, event)), expr(!!xx)), data = data)
}

output

getFit1(surgery, heart)
# Call: survfit(formula = eval(substitute(Surv(start, stop, event) ~ 
#                                           x)), data = data)
# 
# records n.max n.start events median 0.95LCL 0.95UCL
# surgery=0     143    87       0     66     80      66     188
# surgery=1      29    16       0      9    980     186      NA

getFit2(surgery, heart)
# Call: survfit(formula = eval(expr(Surv(start, stop, event) ~ !!xx)), 
#               data = data)
# 
# records n.max n.start events median 0.95LCL 0.95UCL
# surgery=0     143    87       0     66     80      66     188
# surgery=1      29    16       0      9    980     186      NA

getFit3(surgery, heart)
# Call: survfit(formula = new_formula(quote(Surv(start, stop, event)), 
#                                     expr(!!xx)), data = data)
# 
# records n.max n.start events median 0.95LCL 0.95UCL
# surgery=0     143    87       0     66     80      66     188
# surgery=1      29    16       0      9    980     186      NA

Upvotes: 2

Onyambu
Onyambu

Reputation: 79238

We can literally replace the RHS of the formula with the variable we need:

getFit = function(var, data){
  var=as.name(substitute(var))

  survfit(`[<-`(Surv(time, status)~. ,3,list(var)), data = data)
}

getFit(x,aml)
Call: survfit(formula = `[<-`(Surv(time, status) ~ ., 3, list(var)), 
    data = data)

                 n events median 0.95LCL 0.95UCL
x=Maintained    11      7     31      18      NA
x=Nonmaintained 12     11     23       8      NA

getFit("x",aml)
Call: survfit(formula = `[<-`(Surv(time, status) ~ ., 3, list(var)), 
    data = data)

                 n events median 0.95LCL 0.95UCL
x=Maintained    11      7     31      18      NA
x=Nonmaintained 12     11     23       8      NA

How do we know it is correct?

survfit(Surv(time, status) ~ x, data = aml) 
Call: survfit(formula = Surv(time, status) ~ x, data = aml)

                 n events median 0.95LCL 0.95UCL
x=Maintained    11      7     31      18      NA
x=Nonmaintained 12     11     23       8      NA

You can use:

getFit = function(var, data){
  var=as.name(substitute(var))
  a = `[<-`(Surv(time, status)~. ,3,list(var))
  survfit(a, data = data)
}

or

getFit = function(var, data){
  var=as.name(substitute(var))

  survfit(formula(substitute(Surv(time, status)~var,list(var=var))), data = data)
}

Upvotes: 0

Related Questions