Simon
Simon

Reputation: 10150

ggplot outline jitter datapoints

I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:

beta <- paste("beta == ", "0.15")

ggplot(aes(x=xVar, y = yVar), data = data) + 
    geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) + 
    theme_bw() + 
    geom_abline(intercept = 0.0, slope = 0.145950, size=1) + 
    geom_vline(xintercept = 0, linetype = "dashed") + 
    annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
    xlim(-1.5,4) + 
    ylim(-2,2)+
    geom_jitter(shape = 1,size = 3,colour = "black")

However, that results in something like this:

enter image description here

Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?

I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue

The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful

EDIT:

The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:

ggplot(aes(x=xVar, y = yVar, color=group), data = data) + 
    geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
    theme_bw() +
    geom_vline(xintercept = 0, linetype = "dashed") +
    scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
    xlim(-1.5,4) + 
    ylim(-2,2)

My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?

Upvotes: 4

Views: 6090

Answers (2)

awakenedhaki
awakenedhaki

Reputation: 301

This solution is a bit more involved, but I have been having trouble with the ones suggested previously.

I create a jitter dummy column. I then adjust the jitter dummy column to the coordinates of the groups. I also flag outliers in the groups to be removed when making boxplots. The reason for their removal is so they are not appearing twice, once in the boxplot and once with the jitter.

For the black halo, two geom_point are added. Both geom_point are given the same adjusted jitter coordinates. The first geom_point is given a greater size that the second geom_point. The second geom_point is then given whichever color is desired.

set.seed(123)
df <- data.frame(group = rep(c("A", "B", "C"), 300), 
                 y = rnorm(300))

ggplot(data = df, mapping = aes(x = group, y = y)) +
    geom_boxplot()

set.seed(123)
processed_df <- df %>%
  group_by(group) %>%
  # Calculating & adjusting x-axis jitter coordinates
  mutate(x_jitter = runif(n(), min = 0.75, max = 1.25),
         group_adjusted_jitter = x_jitter + (cur_group_id() - 1)) %>%
  # Flagging outliers to prevent duplicate data points
  mutate(quantile_1 = quantile(y, probs = 0.25, na.rm = TRUE),
         quantile_3 = quantile(y, probs = 0.75, na.rm = TRUE),
         iqr = IQR(y, na.rm = TRUE),
         bottom_outlier = y < (quantile_1 - 1.5 * iqr),
         upper_outlier = y > (quantile_3 + 1.5 * iqr),
         outlier = bottom_outlier | upper_outlier) %>%
  ungroup()

ggplot(data = processed_df, mapping = aes(y = y)) +
  # Boxplot without outlier to prevent redundant points when jitter is added
  geom_boxplot(data = subset(processed_df, !outlier),
               mapping = aes(x = group)) +
  # Jittered point for black halo
  geom_point(mapping = aes(x = group_adjusted_jitter), 
             size = 3) +
  # Smaller sized jitter point with group coloring
  geom_point(mapping = aes(x = group_adjusted_jitter, color = group), 
             size = 2) 

Upvotes: 0

alistaire
alistaire

Reputation: 43344

You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:

# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))

ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')

plot with light blue dots with black outlines

The colour, size, and stroke aesthetics let you customize the exact look.


Edit:

For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:

# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))

ggplot(aes(x=x, y = y, fill=group), data = df) + 
    geom_jitter(size=3, alpha=0.6, shape=21) +
    theme_bw() +
    geom_vline(xintercept = 0, linetype = "dashed") +
    scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")

grouped scatterplot with outlined dots

Upvotes: 6

Related Questions