ZMacarozzi
ZMacarozzi

Reputation: 717

Boxplot label outliers according to third variable

I am trying to create a boxplot where the outliers are labelled according to a third variable and not row names.

My data looks as follows: (now edited to contain multiple countries and sector as a factor variable)

country <- c(rep(LETTERS[1:5],2))
sector <- rep(c("one", "two"),5)
set.seed(200)
value <- round(rnorm(10),2)
dat <- data.frame(country,sector,value)
dat

   country sector value
1        A    one  0.08
2        B    two  0.23
3        C    one  0.43
4        D    two  0.56
5        E    one  0.06
6        A    two -0.11
7        B    one -1.02
8        C    two -0.30
9        D    one  0.17

10 E two 1.42

Without the labels the boxplot is like this:

boxplot(value ~ sector, data=dat)

I want the labels on the outlier points to reflect the values for the variable country.

I found a similar question here: Labeling outliers on boxplot in R and I am trying to adapt the code as follows:

bxpdat <- boxplot(value ~ sector, data=dat)
text(bxpdat$group+0.2,                                           
bxpdat$out,                                                 
dat$country[which( dat$value == bxpdat$out, arr.ind=TRUE)[, 1]])  

However I seem to be doing something wrong because this is not working. I would greatly appreciate a suggestion how to fix this code,

thanks in advance!

Upvotes: 1

Views: 988

Answers (1)

MichaelChirico
MichaelChirico

Reputation: 34703

Slight adjustment:

x <- boxplot(value ~ sector, data=dat)

text(x$group, x$out,
     labels=subset(dat, sector %in% x$group & 
                     value %in% x$out)$country, pos=4)

This is not a great general solution since the subset matching might accidentally hit other points. This will work better but I'm not sure how to do it in base:

library(data.table); setDT(dat, key = c("sector", "value"))

dat[ , {
  x <- boxplot(value ~ sector, data=dat)
  with(x, text(group, out, .SD[.(group, out), country], pos = 4))}]

Upvotes: 2

Related Questions