R linear model function (lm) doesn't exclude predicted variable from predictors

Question

I have a dataframe where I want to predict all variables from the other variables, so I construct a loop like this one:

df = iris
df$Species <- NULL

mods = list()
for (i in 1:ncol(df)) {
  mods[[i]] <- lm(df[, i] ~ ., df)
}

But, to my surprise, each variable appears as it's own predictor; even if I do:

mods = list()
for (i in 1:ncol(df)) {
  mods[[i]] = lm(df[, i] ~ . - df[, i], df)
}

The same happens.

I know I can create the correct formula expression on the side with the proper names and so on, but I feel like this shouldn't be the desired behaviour for lm.

The question is: Am I missing something? Is there a reason why this function has such an uncomfortable behaviour? In case the answer to the previous questions is "no", shouldn't it be improved?

user10917479 · Accepted Answer

This seems expected and very much in line with how R operates to me. You are passing df into the data argument, but then referencing a different df in your formula (it is the same one, but a different object reference at this point.

In your first example, your y variable is not from data, it is from that other df. So therefore there is no data column and the . returns all.

In your second example, you are saying to include all variables from data but exclude a column from some other data frame df. So it excludes that column from df but still is left with all the columns from data.

I think this is what you are expecting:

mods = list()
for (i in 1:ncol(df)) {
  mods[[i]] = lm(df[, i] ~ ., df[, -i])
}

R linear model function (lm) doesn't exclude predicted variable from predictors

Answers (2)

Related Questions

R linear model function (lm) doesn&#39;t exclude predicted variable from predictors

Answers (2)

Related Questions

R linear model function (lm) doesn't exclude predicted variable from predictors