Z. Doe
Z. Doe

Reputation: 11

Linear Regression Plot with Mislabeled Outliers

I have run a series of multiple linear regression models and am running diagnostic plots using the method and code found via this link (http://www.r-bloggers.com/checking-glm-model-assumptions-in-r/)

I have no more than 53 data points for every model, however some of the outliers in the regression plots are labeled as above 53... ranging from 58-107. Do the labels of outliers or influential points in the regression plots not correlate with each individual data point? If so what do the labels mean and how do I know which of my data points are the outliers? I have counted my data points in my plots and none of them have more than 53.

I have attached a screenshot of my regression plot output. There are 53 points in this plot, however two of the notable points are labeled 90 and 106. Regression plot example

enter image description here

Upvotes: 1

Views: 401

Answers (1)

Roland
Roland

Reputation: 132706

plot.lm labels the points with the corresponding row names:

set.seed(42)
DF <- data.frame(x = 1:5, y = 2 + 3 * 1:5 + rnorm(5))
rownames(DF) <- letters[1:5]
DF$y[3] <- 1e3

mod <- lm(y ~ x, data = DF)
par(mfrow = c(2,2))
plot(mod, 1:4)

resulting plot

Upvotes: 1

Related Questions