Wagner Jorge
Wagner Jorge

Reputation: 430

How to extract variable names and values from a model formula?

I want a formula with n objects and I need use it in a function. To this I need to access the objects of this formula , e.g.,

e <- new.env()
e$y <- matrix(rnorm(20), ncol = 4)
e$x1 <- 2*matrix(rnorm(20), ncol = 4)
e$x2 <- 2*matrix(rnorm(20), ncol = 4)
e$x3 <- 2*matrix(rnorm(20), ncol = 4)

f = formula(y~x1+x2+x3, env = e)
test <- function(formula){
  #A any function using the variables y, x1, x2 and x3
}

Upvotes: 1

Views: 3340

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73265

Put your formula in the right environment

At the moment, your formula f is still in global environment:

f <- formula(y~x1+x2+x3, env = e)
environment(f)
# <environment: R_GlobalEnv>

The env argument has no effect here, as y~x1+x2+x3 is readily a formula. If you read ?formula:

 env: the environment to associate with the result, if not already
      a formula.

So, you need to take one more step to change environment of f to the right one:

environment(f) <- e

Alternatively, why not use the as.formula instead?

f <- as.formula("y~x1+x2+x3",env=e)
# y ~ x1 + x2 + x3
# <environment: 0xa1bb67c>

Method 1: use model.frame.default()

z <- model.frame.default(f)

str(z)
# 'data.frame': 5 obs. of  4 variables:
#  $ y : num [1:5, 1:4] 0.601 -1.295 -0.312 0.247 -1.545 ...
#  $ x1: num [1:5, 1:4] 1.801 2.177 -1.68 -0.769 -2.371 ...
#  $ x2: num [1:5, 1:4] -2.407 -0.719 2.588 0.431 -0.787 ...
#  $ x3: num [1:5, 1:4] -3.677 -0.638 -1.325 -2.901 -1.013 ...
#   - attr(*, "terms")=Classes 'terms', 'formula'  language y ~ x1 + x2 + x3
#   .. ..- attr(*, "variables")= language list(y, x1, x2, x3)
#   .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
#   .. .. ..- attr(*, "dimnames")=List of 2
#   .. .. .. ..$ : chr [1:4] "y" "x1" "x2" "x3"
#   .. .. .. ..$ : chr [1:3] "x1" "x2" "x3"
#   .. ..- attr(*, "term.labels")= chr [1:3] "x1" "x2" "x3"
#   .. ..- attr(*, "order")= int [1:3] 1 1 1
#   .. ..- attr(*, "intercept")= int 1
#   .. ..- attr(*, "response")= int 1
#   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#   .. ..- attr(*, "predvars")= language list(y, x1, x2, x3)
#   .. ..- attr(*, "dataClasses")= Named chr [1:4] "nmatrix.4" "nmatrix.4" "nmatrix.4" "nmatrix.4"
#   .. .. ..- attr(*, "names")= chr [1:4] "y" "x1" "x2" "x3"

You get a model frame (a data frame with "terms" attributes; see ?model.frame and ?terms.object for more), which you can use as a data frame.


Method 2: use terms.formula and all.vars

You may also use the combination of all.vars and get. For example:

list_call <- attr(terms.formula(f), "variables")
# list(y, x1, x2, x3)
z <- setNames(eval(list_call, envir = environment(f)), all.vars(f))

str(z)
# List of 4
#  $ y : num [1:5, 1:4] -0.107 -0.32 0.452 -0.427 0.184 ...
#  $ x1: num [1:5, 1:4] -2.254 0.674 3.754 -1.2 0.734 ...
#  $ x2: num [1:5, 1:4] 0.15 1.28 0.15 4.26 2.74 ...
#  $ x3: num [1:5, 1:4] -1.505 -0.25 -0.462 3.136 1.282 ...

This gives you a list.


Method 3: get_all_vars() (caution: use with care!)

In principle, this should be the best solution for you, but not.

z <- get_all_vars(f)

str(z)
# 'data.frame': 5 obs. of  16 variables:
#  $ y : num  -0.107 -0.32 0.452 -0.427 0.184
#  $ x1: num  -0.762 0.779 -1.139 0.506 -0.483
#  $ x2: num  0.9873 0.2398 0.5705 0.1761 0.0348
#  $ x3: num  0.287 0.625 0.235 -1.243 -0.146
#  $ NA: num  -2.254 0.674 3.754 -1.2 0.734
#  $ NA: num  0.258 -0.242 -2.28 0.375 6.105
#  $ NA: num  1.483 0.345 0.547 -1.084 -0.813
#  $ NA: num  -2.523 -0.642 -0.403 0.706 1.26
#  $ NA: num  0.15 1.28 0.15 4.26 2.74
#  $ NA: num  0.868 -0.572 0.751 -0.731 -1.912
#  $ NA: num  -0.0673 -0.275 1.0924 1.8836 0.633
#  $ NA: num  0.074 -2.958 -1.564 -1.418 2.05
#  $ NA: num  -1.505 -0.25 -0.462 3.136 1.282
#  $ NA: num  -1.45 1.96 1.27 1.21 -1.04
#  $ NA: num  -0.869 2.991 1.268 -1.601 -0.581
#  $ NA: num  -3.286 0.753 -2.75 3.347 -2.161

This gives you a data frame. But, you should have noticed, you end up with a data frame with 16 variables rather than 4. Well, when any of the variable in your formula is matrix, get_all_vars() will not work properly. However, this approach should be the most handy one, when there is no matrix variable, so you may keep this option in mind, too.

Upvotes: 2

Related Questions