p0kero
p0kero

Reputation: 120

How to access individual variables of a dataset for linear regression?

I'm using the Boston dataset, from the MASS package.

I need to predict crim using the other 13 predictors separately, and save the slope coefficient of each model.

How I can automate this?

I don't know how to access the variables of the dataset in a for loop.

I have tried this to access a single variable using its index:

fit1 = lm(Boston[1]~Boston[2])

But it returns this:

Error en model.frame.default(formula = Boston[1] ~ Boston[2], drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'Boston[1]'

I want to get access to individual variables in order to use a for loop executing 13 different lm()s: something like fit = Boston[i] ~ Boston[i+1]

Upvotes: 3

Views: 320

Answers (3)

Jilber Urbina
Jilber Urbina

Reputation: 61154

You can also use lapply

fits <- lapply(predictors, function(i) {temp <- lm(crim~get(i), data=Boston)$coefficients
                                        names(temp)[2]<- i
                                        return(temp)})
fits
[[1]]
(Intercept)          zn 
 4.45369376 -0.07393498 

[[2]]
(Intercept)       indus 
 -2.0637426   0.5097763 

.... and so on....

If you only want a vector or slope coeff, then try:

> setNames(sapply(fits, "[[", 2), predictors)
         zn       indus        chas         nox          rm         age         dis         rad 
-0.07393498  0.50977633 -1.89277655 31.24853120 -2.68405122  0.10778623 -1.55090168  0.61791093 
        tax     ptratio       black       lstat        medv 
 0.02974225  1.15198279 -0.03627964  0.54880478 -0.36315992 

Upvotes: 1

lm(crim ~ zn, data = Boston)

or

lm(Boston$crim ~ Boston$zn)

use

names(Boston) 

to find out the column names of Boston

if you really want to get column by index, the syntax of getting all rows of the 1st column is

Boston[,1]

Upvotes: 2

Ben Bolker
Ben Bolker

Reputation: 226077

reformulate() is a convenient way to set up formulas with specified predictors:

 library("MASS")
 get.slope <- function(pred) {
     fit <- lm(reformulate(pred,response="crim"),data=Boston)
     ## unname() to avoid duplicating name of response
     return(unname(coef(fit)[2]))
 }
 sapply(names(Boston)[-1],get.slope)
 ##          zn       indus        chas         nox          rm         age 
 ## -0.07393498  0.50977633 -1.89277655 31.24853120 -2.68405122  0.10778623 
 ##         dis         rad         tax     ptratio       black       lstat 
 ## -1.55090168  0.61791093  0.02974225  1.15198279 -0.03627964  0.54880478 
 ##        medv 
 ## -0.36315992 

Upvotes: 5

Related Questions