iambwoo
iambwoo

Reputation: 15

Setting dependent variable via numeric indexing in linear model in R

I'm trying to set the name of a column (or a specific vector element) as my dependent variable (DV) in a linear model in R.

When I do this manually by typing "ITEM26", there are no errors. The DV (y) is ITEM26, and the predictors are every other variable in the data frame.

> lm(ITEM26 ~ ., data = M.compsexitems)

I now want to set the DV in a linear model using the colnames function and numeric indexing, which provides the output of "ITEM26" when I refer to the first element. (My ultimate goal is to set up a for loop so that I can quickly set all column names as the DV of separate linear models.)

> colnames(M.compsexitems)[1]
[1] "ITEM26"

When I try setting up a linear model by using the colnames function and numeric indexing, however, I get an error.

> lm(colnames(M.compsexitems)[1] ~ ., data = M.compsexitems)
Error in model.frame.default(formula = colnames(M.compsexitems)[1] ~ ., : 
  variable lengths differ (found for 'ITEM26')

I get the same error if I manually create a vector of item names (sexitems), and refer to a specific element in the vector via indexing.

> sexitems
 [1] "ITEM26" "ITEM27" 

> summary(lm(sexitems[1] ~ ., data = M.compsexitems))$r.squared 
Error in model.frame.default(formula = sexitems[1] ~ ., data = M.compsexitems,  : 
  variable lengths differ (found for 'ITEM26')

Does anyone know why this error might exist, or how to overcome this error? I have a feeling that the lm function isn't treating the indexed vector element like it's the same as a variable in the data frame, but I'm not sure why.

Example dummy data frame on which the above problems hold true:

> M.compsexitems
  ITEM26         ITEM27
1          2          4
2          3          5

Thank you in advance for your assistance.

Upvotes: 1

Views: 847

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269481

Running lm using the first column as dependent variable and all other columns as independent variables can be done like this:

fm <- lm(M.compsexitems)

giving:

> fm
Call:
lm(formula = M.compsexitems)

Coefficients:
(Intercept)       ITEM27  
         -2            1 

If you need to get the formula explicitly:

fo <- formula(fm)

giving:

> fo
ITEM26 ~ ITEM27
<environment: 0x000000000e2f2b50>

If you want the above formula to explicitly appear in the output of lm then:

do.call("lm", list(fo, quote(M.compsexitems)))

giving:

Call:
lm(formula = ITEM26 ~ ITEM27, data = M.compsexitems)

Coefficients:
(Intercept)       ITEM27  
         -2            1  

If it's a huge regression and you don't want to run the large computation twice then run it the first time using head(M.compsexitems) or alternately construct the formula from character strings:

fo <- formula(paste(names(M.compsexitems)[1], "~."))
do.call("lm", list(fo, quote(M.compsexitems)))

giving:

Call:
lm(formula = ITEM26 ~ ., data = M.compsexitems)

Coefficients:
(Intercept)       ITEM27  
         -2            1  

Note

The input M.compsexitems in reproducible form used was:

Lines <- "
  ITEM26         ITEM27
1          2          4
2          3          5"
M.compsexitems <- read.table(text = Lines)

Upvotes: 2

Related Questions