Reputation: 15
I'm trying to set the name of a column (or a specific vector element) as my dependent variable (DV) in a linear model in R.
When I do this manually by typing "ITEM26", there are no errors. The DV (y) is ITEM26, and the predictors are every other variable in the data frame.
> lm(ITEM26 ~ ., data = M.compsexitems)
I now want to set the DV in a linear model using the colnames function and numeric indexing, which provides the output of "ITEM26" when I refer to the first element. (My ultimate goal is to set up a for loop so that I can quickly set all column names as the DV of separate linear models.)
> colnames(M.compsexitems)[1]
[1] "ITEM26"
When I try setting up a linear model by using the colnames function and numeric indexing, however, I get an error.
> lm(colnames(M.compsexitems)[1] ~ ., data = M.compsexitems)
Error in model.frame.default(formula = colnames(M.compsexitems)[1] ~ ., :
variable lengths differ (found for 'ITEM26')
I get the same error if I manually create a vector of item names (sexitems), and refer to a specific element in the vector via indexing.
> sexitems
[1] "ITEM26" "ITEM27"
> summary(lm(sexitems[1] ~ ., data = M.compsexitems))$r.squared
Error in model.frame.default(formula = sexitems[1] ~ ., data = M.compsexitems, :
variable lengths differ (found for 'ITEM26')
Does anyone know why this error might exist, or how to overcome this error? I have a feeling that the lm function isn't treating the indexed vector element like it's the same as a variable in the data frame, but I'm not sure why.
Example dummy data frame on which the above problems hold true:
> M.compsexitems
ITEM26 ITEM27
1 2 4
2 3 5
Thank you in advance for your assistance.
Upvotes: 1
Views: 847
Reputation: 269481
Running lm
using the first column as dependent variable and all other columns as independent variables can be done like this:
fm <- lm(M.compsexitems)
giving:
> fm
Call:
lm(formula = M.compsexitems)
Coefficients:
(Intercept) ITEM27
-2 1
If you need to get the formula explicitly:
fo <- formula(fm)
giving:
> fo
ITEM26 ~ ITEM27
<environment: 0x000000000e2f2b50>
If you want the above formula to explicitly appear in the output of lm
then:
do.call("lm", list(fo, quote(M.compsexitems)))
giving:
Call:
lm(formula = ITEM26 ~ ITEM27, data = M.compsexitems)
Coefficients:
(Intercept) ITEM27
-2 1
If it's a huge regression and you don't want to run the large computation twice then run it the first time using head(M.compsexitems)
or alternately construct the formula from character strings:
fo <- formula(paste(names(M.compsexitems)[1], "~."))
do.call("lm", list(fo, quote(M.compsexitems)))
giving:
Call:
lm(formula = ITEM26 ~ ., data = M.compsexitems)
Coefficients:
(Intercept) ITEM27
-2 1
The input M.compsexitems
in reproducible form used was:
Lines <- "
ITEM26 ITEM27
1 2 4
2 3 5"
M.compsexitems <- read.table(text = Lines)
Upvotes: 2