SSJ5Broli
SSJ5Broli

Reputation: 49

Linear Regression Error in Eval object 'MEDV' not found

what is wrong with my code? I'm getting Error in eval(predvars, data, env) : object 'MEDV' not found

housing.df<-read.csv("BostonHousing.csv")
# use first 500 rows of data
housing.df <- housing.df[1:500, ]
set.seed(1)
#random select 100 records. sample function does not return data frame. It only returns row index
housing1.index<- sample (nrow (housing.df), 100)
housing1.index #the 100 random rows selected
housing1.df<- housing.df[housing1.index, ]
head (housing1.df) #data frame with 1000 random sample
selected.var <- c(1, 4, 6)
selected.var
set.seed(1)  # set seed for reproducing the partition
#random select 350 rows's index among 1000 rows of data (350 is 70% of 500)
train.index <- sample(c(1:500), 350) 
train.index
# save selected 350 rows and variables in training.df data frame
train.df <- housing.df[train.index, selected.var]
head (train.df)
# save selected the rest 150 rows and variables in valid.df data fram
valid.df <- housing.df[-train.index, selected.var]
# use lm() to run a linear regression of Price on all 11 predictors in the
# training set. 
# use . after ~ to include all the remaining columns in train.df as predictors.
housing.lm <- lm(MEDV ~ ., data = train.df)

Upvotes: 0

Views: 459

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226247

I think there are (at least) two problems here.

Assuming this is the same Boston Housing data set as the BostonHousing data set in the mlbench package, the median value variable is medv, not MEDV: R is case-sensitive.

library(mlbench)
data(BostonHousing)
names(BostonHousing)
 [1] "crim"    "zn"      "indus"   "chas"    "nox"     "rm"      "age"    
 [8] "dis"     "rad"     "tax"     "ptratio" "b"       "lstat"   "medv" 

The second problem is that you extracted only variables 1, 4, and 6 from the data set: this doesn't include the response variable.

names(BostonHousing)[selected.var]
[1] "crim" "chas" "rm"  

If I do

selected.var <- c("crim", "chas", "rm", "medv")

instead of {1,4,6}, run the code to define the training set, and then use

housing.lm <- lm(medv ~ ., data = train.df)

I get something reasonably sensible.

(It's possible that you have a version of the data set where MEDV is capitalized, and you only have the second problem. In any case, check the names of your data set ...)

Upvotes: 2

Related Questions