LeelaSella
LeelaSella

Reputation: 807

Using column numbers not names in lm()

Instead of something like lm(bp~height+age, data=mydata) I would like to specify the columns by number, not name.

I tried lm(mydata[[1]]~mydata[[2]]+mydata[[3]]) but the problem with this is that, in the fitted model, the coefficients are named mydata[[2]], mydata[[3]] etc, whereas I would like them to have the real column names.

Perhaps this is a case of not having your cake and eating it, but if the experts could advise whether this is possible I would be grateful

Upvotes: 20

Views: 22105

Answers (2)

Denis Kazakov
Denis Kazakov

Reputation: 79

lm(mydata[,1] ~ ., mydata[-1])

The trick that I found in a course on R is to remove the response column, otherwise you get warning "essentially perfect fit: summary may be unreliable". I do not know why it works, it does not follow from documentation. Normally, we keep the response column in.

And a simplified version of the earlier answer by Tomas:

lm(
    as.formula(paste(colnames(mydata)[1], "~ .")),
    data=mydata
)

Upvotes: 2

Tomas
Tomas

Reputation: 59435

lm(
    as.formula(paste(colnames(mydata)[1], "~",
        paste(colnames(mydata)[c(2, 3)], collapse = "+"),
        sep = ""
    )),
    data=mydata
)

Instead of c(2, 3) you can use how many indices you want (no need for for loop).

Upvotes: 34

Related Questions