Reputation: 301
I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example:
vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F)
rownames(vv)=c("one","two","three","four","five","six","seven")
boxplot(vv)
I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to this asked but none seemed to have worked the way I want it to.
Upvotes: 6
Views: 45596
Reputation: 59
Or alternatively, you could use the "Boxplot" function from the {car} package which labels outliers for you.
See the following link: https://CRAN.R-project.org/package=car
Upvotes: 5
Reputation: 25306
Or you can simply run the code from this blog post:
source("https://raw.githubusercontent.com/talgalili/R-code-snippets/master/boxplot.with.outlier.label.r") # Load the function
set.seed(6484)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)
(which handles multiple outliers which are close to one another)
Upvotes: 1
Reputation: 263301
In the example given it's a bit boring because they are all the same row. but here is the code:
bxpdat <- boxplot(vv)
text(bxpdat$group, # the x locations
bxpdat$out, # the y values
rownames(vv)[which(vv == bxpdat$out, arr.ind=TRUE)[, 1]], # the labels
pos = 4)
This picks the rownames that have values equal to the "out" list (i.e., the outliers) in the result of boxplot
. Boxplot calls and returns the values from boxplot.stats
. Take a look at:
str(bxpdat)
Upvotes: 4
Reputation: 91
There is a simple way. Note that b in Boxplot in following lines is a capital letter.
library(car)
Boxplot(y ~ x, id.method="y")
Upvotes: 9
Reputation: 733
@sebastian-c This is a slight modification of DWin solution that seem to work with more generality
bx1<-boxplot(pb,las=2,cex.axis=.8)
if(length(bx1$out)!=0){
## get the row of each outlier
out.rows<-sapply(1:length(bx1$out),function(i) which(vv[,bx1$group[i]]==bx1$out[i]))
text(bx1$group,bx1$out,
rownames(vv)[out.rows],
pos=4
)
}
Upvotes: 0
Reputation: 15395
@DWin's solution works very well for a single boxplot, but will fail for anything with duplicate values, like the dataset I have created:
#Create data
set.seed(1)
basenums <- c(1,2,3,4,8,15,30)
vv=matrix(c(basenums, sample(basenums), 1-basenums,
c(0, 29, 30, 31, 32, 33, 60)),nrow=7,ncol=4,byrow=F)
dimnames(vv)=list(c("one","two","three","four","five","six","seven"), 1:4)
On this dataset, @DWin's solution gives:
Which is false, because in the 4th example, it is not possible for the minimum and maximum to be in the same row.
This solution is monstrous (and I hope can be simplified), but effective.
#Reshape data
vv_dat <- as.data.frame(vv)
vv_dat$row <- row.names(vv_dat)
library(reshape2)
new_vv <- melt(vv_dat, id.vars="row")
#Get boxplot data
bxpdat <- as.data.frame(boxplot(value~variable, data=new_vv)[c("out", "group")])
#Get matches with boxplot data
text_guide <- do.call(rbind, apply(bxpdat, 1,
function(x) new_vv[new_vv$value==x[1]&new_vv$variable==x[2], ]))
#Add labels
with(text_guide, text(x=as.numeric(variable)+0.2, y=value, labels=row))
Upvotes: 4