Reputation: 717
I am trying to create a boxplot where the outliers are labelled according to a third variable and not row names.
My data looks as follows: (now edited to contain multiple countries and sector as a factor variable)
country <- c(rep(LETTERS[1:5],2))
sector <- rep(c("one", "two"),5)
set.seed(200)
value <- round(rnorm(10),2)
dat <- data.frame(country,sector,value)
dat
country sector value
1 A one 0.08
2 B two 0.23
3 C one 0.43
4 D two 0.56
5 E one 0.06
6 A two -0.11
7 B one -1.02
8 C two -0.30
9 D one 0.17
10 E two 1.42
Without the labels the boxplot is like this:
boxplot(value ~ sector, data=dat)
I want the labels on the outlier points to reflect the values for the variable country.
I found a similar question here: Labeling outliers on boxplot in R and I am trying to adapt the code as follows:
bxpdat <- boxplot(value ~ sector, data=dat)
text(bxpdat$group+0.2,
bxpdat$out,
dat$country[which( dat$value == bxpdat$out, arr.ind=TRUE)[, 1]])
However I seem to be doing something wrong because this is not working. I would greatly appreciate a suggestion how to fix this code,
thanks in advance!
Upvotes: 1
Views: 988
Reputation: 34703
Slight adjustment:
x <- boxplot(value ~ sector, data=dat)
text(x$group, x$out,
labels=subset(dat, sector %in% x$group &
value %in% x$out)$country, pos=4)
This is not a great general solution since the subset
matching might accidentally hit other points. This will work better but I'm not sure how to do it in base
:
library(data.table); setDT(dat, key = c("sector", "value"))
dat[ , {
x <- boxplot(value ~ sector, data=dat)
with(x, text(group, out, .SD[.(group, out), country], pos = 4))}]
Upvotes: 2