Reputation: 8494
I'd like to write a function like this:
library(survival)
getFit = function(x, data){
survfit(Surv(start, stop, event) ~ x, data = data)
}
getFit(surgery, heart)
getFit("surgery", heart) #if not possible, this would be fine too
Of course, the x
is not read. Please note that I use survfit
with heart
as an example but I encoutered this problem for nearly every formula based function (lm , glm etc).
I know I could write something with paste
and as.formula
, but I wondered if there was something like I'd do in a tidyverse
way, something like:
getSurvPlot = function(x, data=db){
xx = enquo(x)
survfit(Surv(start, stop, event) ~ !!xx, data = data)
}
This last code doesn't work either, but I think this is because survfit
is not part of the tidyverse
.
Is there any clean way to write something like this in base R ?
EDIT : In this very example, I'd now be using survminer::surv_fit
, which is a wrapper around survfit
allowing more flexibility in formulas.
Upvotes: 2
Views: 623
Reputation: 13691
Moody_mudskipper gave a very nice detailed answer. I just wanted to note that your definition of getSurvPlot
is almost correct. Your issue is not with rlang/tidyverse, but in using the quoted argument (which is a formula) inside another formula.
When calling getSurvPlot(surgery, heart)
, enquo
will capture the first argument as ~surgery
, which is already a formula. Rather than using ~
to create a new formula from xx
and Surv
, you only need to update the left-hand side of the formula you already have. This can be done using stats::update()
from base R:
getSurvPlot <- function(x, data=db){
xx <- enquo(x)
survfit(stats::update( xx, Surv(start, stop, event) ~ . ), data = data)
}
getSurvPlot(surgery, heart)
should now work as expected.
As pointed out by @Moody_mudskipper, the actual work is done by stats::update.formula()
, which is an implementation of the S3 generic stats::update()
for formula objects, such as xx
.
Upvotes: 2
Reputation: 47320
Here are 3 options, the 1st is pure base R and the 2 next ones use rlang
.
base R
getFit1 = function(x, data){
survfit(eval(substitute(Surv(start, stop, event) ~ x)), data = data)
}
rlang
There is no reason here to use enquo
as it builds an object that contains the parent environment, in your example the object surgery
doesn't exist in your global environment, it just needs to be evaluated in the context of the formula. So substitute
is the appropriate function here as well.
to be able to use the !!
argument we need a function that supports quasi quotated arguments, and then we need to evaluate it or convert it as formula (I don't know of a function that does both in one step).
So we end up with something that doesn't look better than the base version.
getFit2 = function(x, data){
xx <- substitute(x)
survfit(eval(expr(Surv(start, stop, event) ~ !!xx)), data = data)
}
rlang again, using new_formula
We can build a formula from its lhs and rhs, but now we need to quote the lhs and we still need expr
on the rhs, so base solution still seems better suited for this case.
getFit3 = function(x, data){
xx <- substitute(x)
survfit(new_formula(quote(Surv(start, stop, event)), expr(!!xx)), data = data)
}
output
getFit1(surgery, heart)
# Call: survfit(formula = eval(substitute(Surv(start, stop, event) ~
# x)), data = data)
#
# records n.max n.start events median 0.95LCL 0.95UCL
# surgery=0 143 87 0 66 80 66 188
# surgery=1 29 16 0 9 980 186 NA
getFit2(surgery, heart)
# Call: survfit(formula = eval(expr(Surv(start, stop, event) ~ !!xx)),
# data = data)
#
# records n.max n.start events median 0.95LCL 0.95UCL
# surgery=0 143 87 0 66 80 66 188
# surgery=1 29 16 0 9 980 186 NA
getFit3(surgery, heart)
# Call: survfit(formula = new_formula(quote(Surv(start, stop, event)),
# expr(!!xx)), data = data)
#
# records n.max n.start events median 0.95LCL 0.95UCL
# surgery=0 143 87 0 66 80 66 188
# surgery=1 29 16 0 9 980 186 NA
Upvotes: 2
Reputation: 79238
We can literally replace the RHS of the formula with the variable we need:
getFit = function(var, data){
var=as.name(substitute(var))
survfit(`[<-`(Surv(time, status)~. ,3,list(var)), data = data)
}
getFit(x,aml)
Call: survfit(formula = `[<-`(Surv(time, status) ~ ., 3, list(var)),
data = data)
n events median 0.95LCL 0.95UCL
x=Maintained 11 7 31 18 NA
x=Nonmaintained 12 11 23 8 NA
getFit("x",aml)
Call: survfit(formula = `[<-`(Surv(time, status) ~ ., 3, list(var)),
data = data)
n events median 0.95LCL 0.95UCL
x=Maintained 11 7 31 18 NA
x=Nonmaintained 12 11 23 8 NA
How do we know it is correct?
survfit(Surv(time, status) ~ x, data = aml)
Call: survfit(formula = Surv(time, status) ~ x, data = aml)
n events median 0.95LCL 0.95UCL
x=Maintained 11 7 31 18 NA
x=Nonmaintained 12 11 23 8 NA
You can use:
getFit = function(var, data){
var=as.name(substitute(var))
a = `[<-`(Surv(time, status)~. ,3,list(var))
survfit(a, data = data)
}
or
getFit = function(var, data){
var=as.name(substitute(var))
survfit(formula(substitute(Surv(time, status)~var,list(var=var))), data = data)
}
Upvotes: 0