Reputation:

How to use one variable in regression with many independent variables in lm()

I need to reproduce this code using all of these variables.

composite <- read.csv("file.csv", header = T, stringsAsFactors = FALSE)
composite <- subset(composite, select = -Date)
model1 <- lm(indepvariable ~., data = composite, na.action = na.exclude)
composite is a data frame with 82 variables.

UPDATE:

What I have done is found a way to create an object that contains only the significantly correlated variables, to narrow the number of independent variables down.

I have a variable now: sigvars, which is the names of an object that sorted a correlation matrix and picked out only the variables with correlation coefficients >0.5 and <-0.5. Here is the code:

sortedcor <- sort(cor(composite)[,1])
regvar = NULL

k = 1
for(i in 1:length(sortedcor)){
  if(sortedcor[i] > .5 | sortedcor[i] < -.5){
    regvar[k] = i
  k = k+1
 }
}
regvar

sigvars <- names(sortedcor[regvar])

However, it is not working in my lm() function:

model1 <- lm(data.matrix(composite[1]) ~ sigvars, data = composite)

Error: Error in model.frame.default(formula = data.matrix(composite[1]) ~ sigvars, : variable lengths differ (found for 'sigvars')

Upvotes: 2

Answers (2)

Gavin Simpson

Reputation: 174778

Think about what sigvars is for a minute...?

After sigvars <- names(sortedcor[regvar]), sigvars is a character vector of column names. Say your data have 100 rows and 5 variables come out as significant using the method you've chosen (which doesn't sound overly defensible to be). The model formula you are using will result in composite[, 1] being a vector of length 100 (100 rows) and sigvars being a character vector of length 5.

Assuming you have the variables you want to include in the model, then you could do:

form <- reformulate(sigvars, response = names(composite)[1])
model1 <- lm(form, data = composite)

model1 <- lm(composite[,1] ~ ., data = composite[, sigvars])

In the latter case, do yourself a favour and write the name of the dependent variable into the formula instead of composite[,1].

Also, you don't seem to have appreciated the difference between [i] and [i,j] for data frames, hence you are doing data.matrix(composite[1]) which is taking the first component of composite, leaving it as a data frame, then converting that to a matrix via the data.matrix() function. All you really need is just the name of the dependent variable on the LHS of the formula.

Upvotes: 2

user3416103

Reputation: 1

The error is here:

model1 <- lm(data.matrix(composite[1]) ~ sigvars, data = composite)

The sigvars is names(data). The equation is usually of the form lm(var1 ~ var2+var3+var4), you however have it as lm(var1 ~ var2 var3 var4).

Hopefully that helps.

Upvotes: 0

How to use one variable in regression with many independent variables in lm()

Answers (2)

Related Questions