Cauder
Cauder

Reputation: 2597

How do I jitter just the outliers in a ggplot boxplot?

I am using geom_jitter() for a boxplot with ggplot. I noticed it adds a point for every record on top of the boxplot, instead of jittering just the points that represent outliers.

This is demonstrated by this code.

data <- as.data.frame(c(rnorm(10000, mean = 10, sd = 20), rnorm(300, mean = 90, sd = 5)))
names(data) <- "blapatybloo"

data %>% ggplot(aes("column", blapatybloo)) + geom_boxplot() + geom_jitter(alpha=.1)

How do I apply geom_jitter to only the points on the boxplot without overlapping the rest of the records?

Upvotes: 1

Views: 1575

Answers (1)

sumshyftw
sumshyftw

Reputation: 1131

Create a new column to determine if a data point is an outlier or not. Then overlay the points onto the boxplot.

data <- as.data.frame(c(rnorm(10000, mean = 10, sd = 20), 
                        rnorm(300, mean = 90, sd = 5))) 

names(data) <- "blapatybloo"

data <- data %>% 
               mutate(outlier = blapatybloo > median(blapatybloo) + 
               IQR(blapatybloo)*1.5 | blapatybloo < median(blapatybloo) -
               IQR(blapatybloo)*1.5) 

data %>% 
       ggplot(aes("column", blapatybloo)) + 
       geom_boxplot(outlier.shape = NA) + 
       geom_point(data = function(x) dplyr::filter(x, outlier), 
                  position = "jitter")

Upvotes: 2

Related Questions