Reputation: 301

Labeling outliers on boxplot in R

I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example:

vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F)
rownames(vv)=c("one","two","three","four","five","six","seven")
boxplot(vv)

I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to this asked but none seemed to have worked the way I want it to.

Upvotes: 6

Answers (6)

Tony Knights

Reputation: 59

Or alternatively, you could use the "Boxplot" function from the {car} package which labels outliers for you.

See the following link: https://CRAN.R-project.org/package=car

Upvotes: 5

Tal Galili

Reputation: 25306

Or you can simply run the code from this blog post:

source("https://raw.githubusercontent.com/talgalili/R-code-snippets/master/boxplot.with.outlier.label.r") # Load the function
set.seed(6484)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)

(which handles multiple outliers which are close to one another)

enter image description here

Upvotes: 1

IRTFM

Reputation: 263301

In the example given it's a bit boring because they are all the same row. but here is the code:

bxpdat <- boxplot(vv)
text(bxpdat$group,                                              # the x locations 
     bxpdat$out,                                                # the y values
     rownames(vv)[which(vv == bxpdat$out, arr.ind=TRUE)[, 1]],  # the labels
     pos = 4)

This picks the rownames that have values equal to the "out" list (i.e., the outliers) in the result of boxplot. Boxplot calls and returns the values from boxplot.stats. Take a look at:

 str(bxpdat)

Upvotes: 4

user4168562

Reputation: 91

There is a simple way. Note that b in Boxplot in following lines is a capital letter.

library(car)

Boxplot(y ~ x, id.method="y")

Upvotes: 9

user7669

Reputation: 733

@sebastian-c This is a slight modification of DWin solution that seem to work with more generality

bx1<-boxplot(pb,las=2,cex.axis=.8)
if(length(bx1$out)!=0){
  ## get the row of each outlier
  out.rows<-sapply(1:length(bx1$out),function(i) which(vv[,bx1$group[i]]==bx1$out[i]))
  text(bx1$group,bx1$out,
     rownames(vv)[out.rows],
     pos=4
  )
}

Upvotes: 0

sebastian-c

Reputation: 15395

@DWin's solution works very well for a single boxplot, but will fail for anything with duplicate values, like the dataset I have created:

#Create data
set.seed(1)
basenums <- c(1,2,3,4,8,15,30)
vv=matrix(c(basenums, sample(basenums), 1-basenums, 
          c(0, 29, 30, 31, 32, 33, 60)),nrow=7,ncol=4,byrow=F)
dimnames(vv)=list(c("one","two","three","four","five","six","seven"), 1:4)

On this dataset, @DWin's solution gives:

enter image description here

Which is false, because in the 4th example, it is not possible for the minimum and maximum to be in the same row.

This solution is monstrous (and I hope can be simplified), but effective.

#Reshape data
vv_dat <- as.data.frame(vv)
vv_dat$row <- row.names(vv_dat)
library(reshape2)
new_vv <- melt(vv_dat, id.vars="row")

#Get boxplot data
bxpdat <- as.data.frame(boxplot(value~variable, data=new_vv)[c("out", "group")])

#Get matches with boxplot data
text_guide <- do.call(rbind, apply(bxpdat, 1, 
    function(x) new_vv[new_vv$value==x[1]&new_vv$variable==x[2], ]))

#Add labels
with(text_guide, text(x=as.numeric(variable)+0.2, y=value, labels=row))

enter image description here

Upvotes: 4

Labeling outliers on boxplot in R

Answers (6)

Related Questions