Reputation: 975
I am trying to get the input variable names out of the model object returned by the lm()
function. I tried to access the attribute 'variables' in under lm_obj$terms
. However, the returned object is a 'language' type object rather than a regular vector of names.
For example:
lm_obj = lm(y ~ x + z + z:x, data=df)
attr(lm_obj$terms, 'variables')
# list(x, z)
What is a 'language' type? How to convert this 'language' type object to a regular vector like c('x', 'z')
?
Upvotes: 2
Views: 517
Reputation: 73265
You are on the correct track. "terms"
object is where you should look at. If you want to omit the response variable, you can use delete.response
.
all.vars(delete.response(terms(lm_obj)))
#[1] "x" "z"
I would also like to point you to
labels(terms(lm_obj))
#[1] "x" "z" "x:z"
which is sometimes more useful.
A reproducible example to complement your question
df <- data.frame(y = rnorm(20), x = rnorm(20), z = rnorm(20))
lm_obj <- lm(y ~ x + z + z:x, data = df)
To see why we should look at "terms" than elsewhere, you may try different answers here on the following model:
## thanks to user "WhatIf" for proposing `model = FALSE`
lmfit <- lm(y ~ poly(x) + z + I(z ^ 2) + z:x, data = df,
na.action = na.exclude, model = FALSE)
rm(df) ## do not omit this line! run it before trying other answers
Misc Replies
(1) why does the 'variables' attribute store the 'language' object that spells out a list rather than a regular "list" object?
Because "terms" was created in an early stage of model fitting: formula parsing. Actual variable evaluation happens later.
(2) how to convert a 'language' object to a regular "list" or "vector".
We do evaluation, where the quoted variables can be found:
eval(attr(terms(lm_obj), 'variables'), df)
Upvotes: 4
Reputation: 72593
You may get them out of the call,
fit <- lm(mpg ~ hp, mtcars)
head(all.vars(fit$call), -1)
# [1] "mpg" "hp"
or the names
of the model.frame
which is probably better.
names(model.frame(fit))
# [1] "mpg" "hp"
"language"
is the (storage) mode
or typeof
of the object just as "double"
, "integer"
or "list"
are. See ?mode
, for more explanation and nice examples. In the R language definition you find a detailed explanation—anyway a nice reading.
The 'variables'
attribute of the terms
is such a 'language'
object.
(vars <- attr(fit$terms, 'variables'))
# list(mpg, hp, am)
typeof(vars)
# [1] "language"
To make use of it, we may coerce it as.character
and remove the first element which is the name of the call, i.e. 'list'
,
as.character(vars)[-1]
# [1] "mpg" "hp" "am"
or, it can be eval
uated, what it might actually intended for. Obviously it only will work, if we state with
which data object the information is available.
with(fit$model, eval(vars))
# [[1]]
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
# [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
# [31] 15.0 21.4
#
# [[2]]
# [1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66
# [19] 52 65 97 150 150 245 175 66 91 113 264 175 335 109
#
# [[3]]
# [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
If we ask why it isn't stored as a regular list, I would answer, because the information is already stored in the fit$model
. It would be duplicated in a sense, and the size of the object would grow. The data is probably somewhere needed as list, the information itself actually is easier available using fit$model
.
Upvotes: 1
Reputation: 4425
In your object m_obj$terms
, it is formula and you can access each term of it using [[
extractor operator like
m_obj$terms[[1]]
#> `~` # formula symbol
if you want to get your input variables you can use
strsplit(as.character(lm_obj$terms[[3]])[2] , " \\+ ")[[1]]
#> [1] "x" "z"
Upvotes: 0