Eudald
Eudald

Reputation: 378

R linear model function (lm) doesn't exclude predicted variable from predictors

I have a dataframe where I want to predict all variables from the other variables, so I construct a loop like this one:

df = iris
df$Species <- NULL

mods = list()
for (i in 1:ncol(df)) {
  mods[[i]] <- lm(df[, i] ~ ., df)
}

But, to my surprise, each variable appears as it's own predictor; even if I do:

mods = list()
for (i in 1:ncol(df)) {
  mods[[i]] = lm(df[, i] ~ . - df[, i], df)
}

The same happens.

I know I can create the correct formula expression on the side with the proper names and so on, but I feel like this shouldn't be the desired behaviour for lm.

The question is: Am I missing something? Is there a reason why this function has such an uncomfortable behaviour? In case the answer to the previous questions is "no", shouldn't it be improved?

Upvotes: 0

Views: 862

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269371

It excludes names but that code does not use any.

df = iris
df$Species <- NULL

LM <- function(nm) {
  fo <- paste(nm, "~.")
  do.call("lm", list(fo, quote(df)))
}
Map(LM, names(df))

giving this 4 element list (only first shown):

$Sepal.Length

Call:
lm(formula = "Sepal.Length ~.", data = df)

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length   Petal.Width  
      1.8560        0.6508        0.7091       -0.5565  

## ..snip...

Upvotes: 2

user10917479
user10917479

Reputation:

This seems expected and very much in line with how R operates to me. You are passing df into the data argument, but then referencing a different df in your formula (it is the same one, but a different object reference at this point.

In your first example, your y variable is not from data, it is from that other df. So therefore there is no data column and the . returns all.

In your second example, you are saying to include all variables from data but exclude a column from some other data frame df. So it excludes that column from df but still is left with all the columns from data.

I think this is what you are expecting:

mods = list()
for (i in 1:ncol(df)) {
  mods[[i]] = lm(df[, i] ~ ., df[, -i])
}

Upvotes: 2

Related Questions