Reputation: 127
have a dataset (found here- https://netfiles.umn.edu/users/nacht001/www/nachtsheim/Kutner/Appendix%20C%20Data%20Sets/APPENC01.txt) and I have done some R coding for linear regression. In the attached dataset the columns are not labeled. I had to label the columns of the dataset and save it as a csv and I apologize I can't get that on here… but the columns I am using are column 3(age) column 4(infection) column 5 (culratio) column 10 (census) and column 12(service), column 9 (region). I named the dataset hospital.
I am supposed to "For each geographic region, regress infection risk (Y) against the predictor variables age, culratio, census, service using a first order regression model. Then I need to find the MSE for each region. This is the code I have.
NE<- subset(hospital, region=="1")
NC<- subset(hospital, region=="2")
S<- subset(hospital, region=="3")
W<- subset(hospital, region=="4")
then to do a first order linear regression model I use the basic code for each
NE.Model<- lm(NE$infection~ NE$age + NE$culratio + NE$census + NE$service)
summary(NE.Model)
and I can get the adjusted R squared value, but how do I find MSE from this output?
Upvotes: 0
Views: 8604
Reputation: 145755
Moving my comment to an answer. The "errors" or "residuals" are part of the model object, NE.Model$residuals
, so getting the mean square error is as easy as that: mean(NE.Model$residuals^2)
.
Just as a note, you could do this in fewer steps by fitting a region
fixed effect term in your model and then calculating the MSE for each subset of the residuals. Same difference, really.
Upvotes: 1