PingPong
PingPong

Reputation: 975

Get a vector of input variables' names out of "lm" and "glm" objects

I am trying to get the input variable names out of the model object returned by the lm() function. I tried to access the attribute 'variables' in under lm_obj$terms. However, the returned object is a 'language' type object rather than a regular vector of names. For example:

lm_obj = lm(y ~ x + z + z:x, data=df)
attr(lm_obj$terms, 'variables')
# list(x, z)

What is a 'language' type? How to convert this 'language' type object to a regular vector like c('x', 'z')?

Upvotes: 2

Views: 517

Answers (3)

Zheyuan Li
Zheyuan Li

Reputation: 73265

You are on the correct track. "terms" object is where you should look at. If you want to omit the response variable, you can use delete.response.

all.vars(delete.response(terms(lm_obj)))
#[1] "x" "z"

I would also like to point you to

labels(terms(lm_obj))
#[1] "x"   "z"   "x:z"

which is sometimes more useful.


A reproducible example to complement your question

df <- data.frame(y = rnorm(20), x = rnorm(20), z = rnorm(20))
lm_obj <- lm(y ~ x + z + z:x, data = df)

To see why we should look at "terms" than elsewhere, you may try different answers here on the following model:

## thanks to user "WhatIf" for proposing `model = FALSE`
lmfit <- lm(y ~ poly(x) + z + I(z ^ 2) + z:x, data = df,
            na.action = na.exclude, model = FALSE)
rm(df)  ## do not omit this line! run it before trying other answers

Misc Replies

(1) why does the 'variables' attribute store the 'language' object that spells out a list rather than a regular "list" object?

Because "terms" was created in an early stage of model fitting: formula parsing. Actual variable evaluation happens later.

(2) how to convert a 'language' object to a regular "list" or "vector".

We do evaluation, where the quoted variables can be found:

eval(attr(terms(lm_obj), 'variables'), df)

Upvotes: 4

jay.sf
jay.sf

Reputation: 72593

You may get them out of the call,

fit <- lm(mpg ~ hp, mtcars)

head(all.vars(fit$call), -1)
# [1] "mpg" "hp" 

or the names of the model.frame which is probably better.

names(model.frame(fit))
# [1] "mpg" "hp" 

"language" is the (storage) mode or typeof of the object just as "double", "integer" or "list" are. See ?mode, for more explanation and nice examples. In the R language definition you find a detailed explanation—anyway a nice reading.

Update

The 'variables' attribute of the terms is such a 'language' object.

(vars <- attr(fit$terms, 'variables'))
# list(mpg, hp, am)

typeof(vars)
# [1] "language"

To make use of it, we may coerce it as.character and remove the first element which is the name of the call, i.e. 'list',

as.character(vars)[-1]
# [1] "mpg" "hp"  "am" 

or, it can be evaluated, what it might actually intended for. Obviously it only will work, if we state with which data object the information is available.

with(fit$model, eval(vars))
# [[1]]
#  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
# [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
# [31] 15.0 21.4
# 
# [[2]]
#  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66
# [19]  52  65  97 150 150 245 175  66  91 113 264 175 335 109
# 
# [[3]]
# [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

If we ask why it isn't stored as a regular list, I would answer, because the information is already stored in the fit$model. It would be duplicated in a sense, and the size of the object would grow. The data is probably somewhere needed as list, the information itself actually is easier available using fit$model.

Upvotes: 1

Mohamed Desouky
Mohamed Desouky

Reputation: 4425

In your object m_obj$terms, it is formula and you can access each term of it using [[ extractor operator like

m_obj$terms[[1]]

#> `~`  # formula symbol 

if you want to get your input variables you can use

strsplit(as.character(lm_obj$terms[[3]])[2] , " \\+ ")[[1]]

#> [1] "x" "z"

Upvotes: 0

Related Questions