Reputation: 2597
I am using geom_jitter()
for a boxplot with ggplot. I noticed it adds a point for every record on top of the boxplot, instead of jittering just the points that represent outliers.
This is demonstrated by this code.
data <- as.data.frame(c(rnorm(10000, mean = 10, sd = 20), rnorm(300, mean = 90, sd = 5)))
names(data) <- "blapatybloo"
data %>% ggplot(aes("column", blapatybloo)) + geom_boxplot() + geom_jitter(alpha=.1)
How do I apply geom_jitter
to only the points on the boxplot without overlapping the rest of the records?
Upvotes: 1
Views: 1575
Reputation: 1131
Create a new column to determine if a data point is an outlier or not. Then overlay the points onto the boxplot.
data <- as.data.frame(c(rnorm(10000, mean = 10, sd = 20),
rnorm(300, mean = 90, sd = 5)))
names(data) <- "blapatybloo"
data <- data %>%
mutate(outlier = blapatybloo > median(blapatybloo) +
IQR(blapatybloo)*1.5 | blapatybloo < median(blapatybloo) -
IQR(blapatybloo)*1.5)
data %>%
ggplot(aes("column", blapatybloo)) +
geom_boxplot(outlier.shape = NA) +
geom_point(data = function(x) dplyr::filter(x, outlier),
position = "jitter")
Upvotes: 2