Reputation: 11
I have a data set [1000 x 80] of 1000 data points each with 80 variable values. I have to linearly regress two variables: price and area, and identify the 5 data points that have highest squared residuals. For these identified data points, I have to display 4 of the 80 variable values.
I do not know how to use the residuals to identify the original data points. All I have at the moment is:
model_lm <- lm(log(price) ~ log(area), data = ames)
Can I please get some guidance on how I can approach the above problem
Upvotes: 1
Views: 4769
Reputation:
The model_lm object will contain a variable called 'residuals' that will have the residuals in the same order as the original observations. If I'm understanding the question correctly, then an easy way to do this is base R is:
ames$residuals <- model_lm$residuals ## Add the residuals to the data.frame
o <- order(ames$residuals^2, decreaseing=T) ## Reorder to put largest first
ames[o[1:5],] ## Return results
Upvotes: 1