Reputation: 2780
I am trying to learn how to make my own functions with formula
objects. I am mostly confused with how to parse them.
Lets say I have the following
gigl <- function(formula, data, family = gaussian())
Using the R dataset BOD
> BOD
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
It is easy to fit a linear model with lm
>lm(Time~demand, data=BOD)
Call:
lm(formula = Time ~ demand)
Coefficients:
(Intercept) demand
-1.8905 0.3746
How can I make my own function by parsing a formula
?
For example if I had
>gigl(Time~demand, data=BOD)
How can I parse the components? I don't really care what the function gigl
does. I just want to know how to work with the formula
.
Due to questions about a concrete example lets try the following:
Say that I want to use the inputs from a formula to build a cor()
matrix. So from the above I would see the result of cor(Time,demand)
and if more variables were added I would see the complete cor()
of all inputs.
Upvotes: 10
Views: 4140
Reputation: 206197
The rlang
package can make it easier to work with formulas in the tidyeval paradigm. For example you can do
library(rlang)
mycor <- function(form, data) {
v1 <- f_lhs(form)
v2 <- f_rhs(form)
d <- enquo(data)
qq <- expr(with(!!d, cor(!!v1, !!v2)))
eval_tidy(qq)
}
mycor(disp~drat, mtcars)
# [1] -0.7102139
with(mtcars, cor(disp, drat))
# [1] -0.7102139
The f_lhs/f_rhs
functions help to extract the left-hand side and right-hand side respectively. Then we can use quo()
and the !!
operator to re-assemble those piece into a new function call. Then we evaluate that new function call with eval_tidy
.
Upvotes: 2
Reputation: 269501
This assumes that two variables are used (expressions are not allowed). Assuming that the two variables are in the formula and that they can appear on the right or left or both, all.vars
which gets the variable names and get_all_vars
which gets the content can be useful:
gig1 <- function(formula, data) cor(data[all.vars(formula)])
gig1(demand ~ Time, BOD)
giving:
demand Time
demand 1.0000000 0.8030693
Time 0.8030693 1.0000000
or
gig2 <- function(formula, data) cor(get_all_vars(formula, data))
gig2(demand ~ Time, BOD)
giving:
demand Time
demand 1.0000000 0.8030693
Time 0.8030693 1.0000000
You might want to look at the source of lm
and the Formula package for more ideas.
Upvotes: 2
Reputation: 226162
Here's a function that takes a formula and transforms it into a call to the cor()
function, then evaluates that call in an environment consisting of the data ...
f <- function(form,data) {
form[[1]] <- quote(cor)
eval(form,data)
}
f(demand~Time,BOD)
## [1] 0.8030693
Upvotes: 6
Reputation: 50668
Not sure what you're trying to do, but you could take a look at the terms
of a formula:
fm <- formula(Time ~ demand);
tms <- terms(fm);
tms;
#Time ~ demand
#attr(,"variables")
#list(Time, demand)
#attr(,"factors")
# demand
#Time 0
#demand 1
#attr(,"term.labels")
#[1] "demand"
#attr(,"order")
#[1] 1
#attr(,"intercept")
#[1] 1
#attr(,"response")
#[1] 1
#attr(,".Environment")
#<environment: R_GlobalEnv>
From tms
you could extract relevant entries and attributes. For example,
attr(tms, "variables");
#list(Time, demand)
Upvotes: 1