Paras Joshi
Paras Joshi

Reputation: 11

Identify the outliers with the highest squared residuals under the Linear regression model in R

I have a data set [1000 x 80] of 1000 data points each with 80 variable values. I have to linearly regress two variables: price and area, and identify the 5 data points that have highest squared residuals. For these identified data points, I have to display 4 of the 80 variable values.

I do not know how to use the residuals to identify the original data points. All I have at the moment is:

model_lm <- lm(log(price) ~ log(area), data = ames) 

Can I please get some guidance on how I can approach the above problem

Upvotes: 1

Views: 4769

Answers (1)

KMcC
KMcC

Reputation:

The model_lm object will contain a variable called 'residuals' that will have the residuals in the same order as the original observations. If I'm understanding the question correctly, then an easy way to do this is base R is:

ames$residuals <- model_lm$residuals  ## Add the residuals to the data.frame

o <- order(ames$residuals^2, decreaseing=T)   ## Reorder to put largest first

ames[o[1:5],]   ## Return results

Upvotes: 1

Related Questions