Alex
Alex

Reputation: 2780

How to work with formula objects in R

I am trying to learn how to make my own functions with formula objects. I am mostly confused with how to parse them.

Lets say I have the following

gigl <- function(formula, data, family = gaussian()) 

Using the R dataset BOD

> BOD
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

It is easy to fit a linear model with lm

>lm(Time~demand, data=BOD)
Call:
lm(formula = Time ~ demand)

Coefficients:
(Intercept)       demand  
    -1.8905       0.3746

How can I make my own function by parsing a formula?

For example if I had

>gigl(Time~demand, data=BOD)

How can I parse the components? I don't really care what the function gigl does. I just want to know how to work with the formula.

Edit

Due to questions about a concrete example lets try the following:

Say that I want to use the inputs from a formula to build a cor() matrix. So from the above I would see the result of cor(Time,demand) and if more variables were added I would see the complete cor() of all inputs.

Upvotes: 10

Views: 4140

Answers (4)

MrFlick
MrFlick

Reputation: 206197

The rlang package can make it easier to work with formulas in the tidyeval paradigm. For example you can do

library(rlang)

mycor <- function(form, data) {
  v1 <- f_lhs(form)  
  v2 <- f_rhs(form)
  d <- enquo(data)
  qq <- expr(with(!!d, cor(!!v1, !!v2)))
  eval_tidy(qq)

}

mycor(disp~drat, mtcars)
# [1] -0.7102139

with(mtcars, cor(disp, drat))
# [1] -0.7102139

The f_lhs/f_rhs functions help to extract the left-hand side and right-hand side respectively. Then we can use quo() and the !! operator to re-assemble those piece into a new function call. Then we evaluate that new function call with eval_tidy.

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269501

This assumes that two variables are used (expressions are not allowed). Assuming that the two variables are in the formula and that they can appear on the right or left or both, all.vars which gets the variable names and get_all_vars which gets the content can be useful:

gig1 <- function(formula, data) cor(data[all.vars(formula)])

gig1(demand ~ Time, BOD)

giving:

          demand      Time
demand 1.0000000 0.8030693
Time   0.8030693 1.0000000

or

gig2 <- function(formula, data) cor(get_all_vars(formula, data))

gig2(demand ~ Time, BOD)

giving:

          demand      Time
demand 1.0000000 0.8030693
Time   0.8030693 1.0000000

You might want to look at the source of lm and the Formula package for more ideas.

Upvotes: 2

Ben Bolker
Ben Bolker

Reputation: 226162

Here's a function that takes a formula and transforms it into a call to the cor() function, then evaluates that call in an environment consisting of the data ...

f <- function(form,data) {
    form[[1]] <- quote(cor)
    eval(form,data)
}
f(demand~Time,BOD)
## [1] 0.8030693

Upvotes: 6

Maurits Evers
Maurits Evers

Reputation: 50668

Not sure what you're trying to do, but you could take a look at the terms of a formula:

fm <- formula(Time ~ demand);
tms <- terms(fm);
tms;
#Time ~ demand
#attr(,"variables")
#list(Time, demand)
#attr(,"factors")
#       demand
#Time        0
#demand      1
#attr(,"term.labels")
#[1] "demand"
#attr(,"order")
#[1] 1
#attr(,"intercept")
#[1] 1
#attr(,"response")
#[1] 1
#attr(,".Environment")
#<environment: R_GlobalEnv>

From tms you could extract relevant entries and attributes. For example,

attr(tms, "variables");
#list(Time, demand)

Upvotes: 1

Related Questions