Reputation: 13
I want to identify data points with high leverage and large residuals. My aim is to remove them and repeat linear regression analyses. Specifically I want to remove studentized residuals larger than 3 and data points with cooks D > 4/n. How could I perform that in the sample data and do the same analysi swithout the influential points?
Sample data:
hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")
lm1<-lm(write ~ read +ses + prog+race.f, data = hsb2)
Upvotes: 1
Views: 14985
Reputation: 23
You could also remove the values with high leverage and large residuals
HighLeverage <- cooks.distance(lm1) > (4/nrow(hsb2))
LargeResiduals <- rstudent(lm1) > 3
hsb2 <- hsb2[!HighLeverage & !LargeResiduals,]
lm1<-lm(write ~ read +ses + prog+race.f, data = hsb2)
Upvotes: 0
Reputation: 9687
Set the weights of those points to zero, then update
the model:
w <- abs(rstudent(lm1)) < 3 & abs(cooks.distance(lm1)) < 4/nrow(lm1$model)
lm2 <- update(lm1, weights=as.numeric(w))
This is probably a weak approach statistically, but at least the code isn't too hard...
Upvotes: 4