Reputation: 1068
I have a data frame structured like this:
set.seed(123)
data<- data.frame(
ID=factor(letters[seq(20)]),
Location = rep(c("alph","brav", "char","delt"), each = 5),
Var1 = rnorm(20),
Var2 = rnorm(20),
Var3 = rnorm(20)
)
I have built a linear model: mod1 <- lm(Var1~Location,mydata)
. When I use: plot(mod1)
on the linear model object, outliers are labeled with the index of the value. Is there a way to label those points with the value in ID
? In other words, in this example values 6, 16, and 18 are labeled in the plots, and I want to them to be labeled with f, p, and r, respectively, because those are their corresponding values in ID
Upvotes: 1
Views: 619
Reputation: 46908
stats:::plot.lm is used to plot the diagnostic plots, and there are two options:
id.n: number of points to be labelled in each plot, starting with
the most extreme.
labels.id: vector of labels, from which the labels for extreme points
will be chosen. ‘NULL’ uses observation numbers.
By default id.n=3, so they always label the 3 observations with the largest cook's distance. I am including this as part of the answer because you might want to be careful about interpreting them as outliers.
To get these points, you do
mod1 <- lm(Var1~Location,data)
outl = order(-cooks.distance(mod1))[1:3]
outl
[1] 18 6 16
To plot, you can either provide the labels.id the ID you want, or you start from scratch:
par(mfrow=c(1,2))
plot(mod1,which=1,labels.id =data$ID)
plot(fitted(mod1),residuals(mod1))
panel.smooth(fitted(mod1),residuals(mod1))
text(fitted(mod1)[outl]+0.01,residuals(mod1)[outl],
data$ID[outl],col="red")
To go through all the plots, do:
plot(mod1,labels.id=data$ID)
Upvotes: 1