Reputation: 430
I want to make a plot like this:
The boxes would represent the distribution of a continuous variable within groups; the red circles are the points showing all of the actual observations. So far, so good. This would be simple with geom_boxplot
+ geom_point
with a group aesthetic.
Here are the two twists:
Some context:
This plot is showing usage of a product (Y axis) vs allowed usage (X). The X axis groups are mutually exclusive, discrete tiers on what is essentially an infinite, continuous variable for usage. EG, 1-4, 5-9, 10-20, etc. It doesn't feel crazy to me from a visual standpoint to plot the continuous within those groups, does that make sense? But I have no idea how I'd get started on getting ggplot2
to agree with me.
My preference is to have the box plots be evenly spaced along the X axis, but if I need to start with the axis as continuous, and have the groups take up proportionate space on the X axis then I would settle for that (likely with a logged axis to prevent the lower, narrow groups from being completely smushed.
This should work as sample data:
df <- structure(list(usage = c(1L, 4L, 2L, 5L, 4L, 1L, 2L, 98L, 9L,
4L, 6L, 6L, 1L, 2L, 2L, 2L, 3L, 2L, 5L, 1L), allowed = c(2, 20,
3, 3, 5, 5, 1, 1, 1, 5, 10, 5, 7, 12, 2, 5, 23, 10, 5, 2), id = c(1055L,
2155L, 6637L, 11068L, 2070L, 8524L, 9157L, 5963L, 7593L, 3470L,
3557L, 7469L, 9142L, 408L, 9446L, 1552L, 4788L, 7233L, 8464L,
2188L), group = c("A", "B", "A", "A", "A", "A", "A", "A", "A",
"A", "B", "A", "B", "B", "A", "A", "B", "B", "A", "A")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 2
Views: 1018
Reputation: 13853
Here's what I came up with for you:
# you had some values that were = 98 in usage and throwing everything off..
df <- df %>% dplyr::filter(usage < 50)
p <-
ggplot(df, aes(allowed, usage)) +
geom_boxplot(aes(group=group)) +
geom_point() +
geom_smooth(alpha=0, method='lm') +
facet_wrap(~group, scales='free_x', strip.position = 'bottom') +
theme_classic() +
theme(
axis.text.x = element_blank(), # remove x axis text
axis.ticks.x = element_blank(), # remove tick marks on x axis
axis.title.x = element_blank(), # remove title for axis
strip.background = element_blank(), # no box on facet label
strip.placement = 'outside', # facet label is outside axis line
strip.text = element_text(size=12),
panel.spacing.x = unit(0, 'pt') # remove space between facets
)
p
The general idea is to consider that you kind of have 2 x axes here. The primary axis by which you want to plot your points is df$allowed
, whereas you then want to group based on df$group
. The easiest solution I can think of here was to treat each value of df$group
as a separate facet, and then "stitch" the facets together by setting the space in-between them to zero. Seems to work well.
The only comment here otherwise is that the boxes might be a bit too close together for your liking - making discrimination of the points of one group to be distinguished from another. Since each group is a facet, and therefore a completely separate plot, you can "squish" the boxes together by adding/expanding the primary x axis for each facet like so:
p + scale_x_continuous(expand=expansion(mult=c(0.8)))
Note: I had to remove a few super-high values in usage in order to be able to actually see your plots properly. I imagine this is an artifact from copying over your data (like a missing value).
Upvotes: 5