Identify and remove data points with high leverage and large residuals

Question

I want to identify data points with high leverage and large residuals. My aim is to remove them and repeat linear regression analyses. Specifically I want to remove studentized residuals larger than 3 and data points with cooks D > 4/n. How could I perform that in the sample data and do the same analysi swithout the influential points?

Sample data:

hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")

lm1<-lm(write ~ read +ses + prog+race.f, data = hsb2)

Neal Fultz · Accepted Answer

Set the weights of those points to zero, then update the model:

w <- abs(rstudent(lm1)) < 3 & abs(cooks.distance(lm1)) < 4/nrow(lm1$model)
lm2 <- update(lm1, weights=as.numeric(w))

This is probably a weak approach statistically, but at least the code isn't too hard...

Identify and remove data points with high leverage and large residuals

Answers (2)

Related Questions