Reputation: 378
I have a dataframe where I want to predict all variables from the other variables, so I construct a loop like this one:
df = iris
df$Species <- NULL
mods = list()
for (i in 1:ncol(df)) {
mods[[i]] <- lm(df[, i] ~ ., df)
}
But, to my surprise, each variable appears as it's own predictor; even if I do:
mods = list()
for (i in 1:ncol(df)) {
mods[[i]] = lm(df[, i] ~ . - df[, i], df)
}
The same happens.
I know I can create the correct formula expression on the side with the proper names and so on, but I feel like this shouldn't be the desired behaviour for lm.
The question is: Am I missing something? Is there a reason why this function has such an uncomfortable behaviour? In case the answer to the previous questions is "no", shouldn't it be improved?
Upvotes: 0
Views: 862
Reputation: 269371
It excludes names but that code does not use any.
df = iris
df$Species <- NULL
LM <- function(nm) {
fo <- paste(nm, "~.")
do.call("lm", list(fo, quote(df)))
}
Map(LM, names(df))
giving this 4 element list (only first shown):
$Sepal.Length
Call:
lm(formula = "Sepal.Length ~.", data = df)
Coefficients:
(Intercept) Sepal.Width Petal.Length Petal.Width
1.8560 0.6508 0.7091 -0.5565
## ..snip...
Upvotes: 2
Reputation:
This seems expected and very much in line with how R operates to me. You are passing df
into the data
argument, but then referencing a different df
in your formula (it is the same one, but a different object reference at this point.
In your first example, your y
variable is not from data
, it is from that other df
. So therefore there is no data
column and the .
returns all.
In your second example, you are saying to include all variables from data
but exclude a column from some other data frame df
. So it excludes that column from df
but still is left with all the columns from data
.
I think this is what you are expecting:
mods = list()
for (i in 1:ncol(df)) {
mods[[i]] = lm(df[, i] ~ ., df[, -i])
}
Upvotes: 2