Nip
Nip

Reputation: 420

How to label ggplot2 boxplot outliers with a third variable?

library(ggplot2)

A <- c(rep(LETTERS[1:5],2))
B <- rep(c("one", "two"),5)
set.seed(200)
C <- round(rnorm(10),2)
dff <- data.frame(A,B,C)
dff

ggplot(dff, aes(x=B, y=C, fill=B)) + 
    geom_boxplot()

Is it possible to use A to label the outliers?

Upvotes: 0

Views: 256

Answers (2)

Caitlin
Caitlin

Reputation: 525

Here's a solution to label only the outliers in your data:

library(tidyverse)
outlier <- dff %>%
  group_by(B) %>%
  summarise(outlier = list(boxplot.stats(C)$out))


ggplot(dff, aes(x=B, y=C, fill=B)) + 
  geom_boxplot() +
  geom_text(aes(label = if_else(C %in% unlist(outlier$outlier), as.character(A), "")), position=position_nudge(x=-.1))                                              

which produces this plot:

enter image description here

Upvotes: 1

Nip
Nip

Reputation: 420

I edited the second answer in the question suggested in the first comment to suit my case.

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
dat <- dff %>% tibble::rownames_to_column(var="outlier") %>% group_by(factor(B)) %>% 
mutate(is_outlier=ifelse(is_outlier(C), C, as.numeric(NA)))
dat$outlier[which(is.na(dat$is_outlier))] <- as.numeric(NA)
ggplot(dat, aes(y=C, x=factor(B),fill=factor(B))) + 
geom_boxplot() + 
geom_text(aes(label=dat$A[dat$is_outlier != "NA"]),na.rm=TRUE,nudge_y=0.05)

Might not be the best answer :D

Upvotes: 1

Related Questions