Overlay geom_line within a categorical x axis for each group - ggplot2

Question

I want to make a plot like this:

The boxes would represent the distribution of a continuous variable within groups; the red circles are the points showing all of the actual observations. So far, so good. This would be simple with geom_boxplot + geom_point with a group aesthetic.

Here are the two twists:

The horizontal position of the points are not a random jitter. They are instead an X-Y coordinate utilizing a continuous X axis instead of a categorical axis
The line is a trendline that is fit on those points.

Some context: This plot is showing usage of a product (Y axis) vs allowed usage (X). The X axis groups are mutually exclusive, discrete tiers on what is essentially an infinite, continuous variable for usage. EG, 1-4, 5-9, 10-20, etc. It doesn't feel crazy to me from a visual standpoint to plot the continuous within those groups, does that make sense? But I have no idea how I'd get started on getting ggplot2 to agree with me.

My preference is to have the box plots be evenly spaced along the X axis, but if I need to start with the axis as continuous, and have the groups take up proportionate space on the X axis then I would settle for that (likely with a logged axis to prevent the lower, narrow groups from being completely smushed.

This should work as sample data:


df <- structure(list(usage = c(1L, 4L, 2L, 5L, 4L, 1L, 2L, 98L, 9L, 
                               4L, 6L, 6L, 1L, 2L, 2L, 2L, 3L, 2L, 5L, 1L), allowed = c(2, 20, 
                                                                                        3, 3, 5, 5, 1, 1, 1, 5, 10, 5, 7, 12, 2, 5, 23, 10, 5, 2), id = c(1055L, 
                                                                                                                                                          2155L, 6637L, 11068L, 2070L, 8524L, 9157L, 5963L, 7593L, 3470L, 
                                                                                                                                                          3557L, 7469L, 9142L, 408L, 9446L, 1552L, 4788L, 7233L, 8464L, 
                                                                                                                                                          2188L), group = c("A", "B", "A", "A", "A", "A", "A", "A", "A", 
                                                                                                                                                                            "A", "B", "A", "B", "B", "A", "A", "B", "B", "A", "A")), row.names = c(NA, 
                                                                                                                                                                                                                                                   -20L), class = c("tbl_df", "tbl", "data.frame"))

chemdork123 · Accepted Answer

Here's what I came up with for you:

# you had some values that were = 98 in usage and throwing everything off..
df <- df %>% dplyr::filter(usage < 50)

p <- 
ggplot(df, aes(allowed, usage)) +
  geom_boxplot(aes(group=group)) +
  geom_point() +
  geom_smooth(alpha=0, method='lm') +
  facet_wrap(~group, scales='free_x', strip.position = 'bottom') +
  theme_classic() +
  theme(
    axis.text.x = element_blank(),       # remove x axis text
    axis.ticks.x = element_blank(),      # remove tick marks on x axis
    axis.title.x = element_blank(),      # remove title for axis
    strip.background = element_blank(),  # no box on facet label
    strip.placement = 'outside',         # facet label is outside axis line
    strip.text = element_text(size=12),
    panel.spacing.x = unit(0, 'pt')      # remove space between facets
  )
p

The general idea is to consider that you kind of have 2 x axes here. The primary axis by which you want to plot your points is df$allowed, whereas you then want to group based on df$group. The easiest solution I can think of here was to treat each value of df$group as a separate facet, and then "stitch" the facets together by setting the space in-between them to zero. Seems to work well.

The only comment here otherwise is that the boxes might be a bit too close together for your liking - making discrimination of the points of one group to be distinguished from another. Since each group is a facet, and therefore a completely separate plot, you can "squish" the boxes together by adding/expanding the primary x axis for each facet like so:

p + scale_x_continuous(expand=expansion(mult=c(0.8)))

Note: I had to remove a few super-high values in usage in order to be able to actually see your plots properly. I imagine this is an artifact from copying over your data (like a missing value).

Overlay geom_line within a categorical x axis for each group - ggplot2

Answers (1)

Related Questions