Denver Dang
Denver Dang

Reputation: 2605

Overlaying two ggplot facet_wrap histograms

So I have two histogram plots I can do one at a time. The result using the following code gives a 2 row x 3 col facet plot for six different histograms:

ggplot(data) +
    aes(x=values) +
    geom_histogram(binwidth=2, fill='blue', alpha=0.3, color="black", aes(y=(..count..)*100/(sum(..count..)/6))) +
    facet_wrap(~ model_f, ncol = 3)

Here the aes(y...) just gives the percentage instead of counts.

As stated, I have two of this 6 facet_wrap plot, which I now which to combine to show that one is more shifted than the other. In addition, the data size is not the same, so for one I have:

# A tibble: 5,988 x 5
   values ID   structure   model   model_f
   <dbl> <chr>     <chr>   <chr>    <fctr>
 1     6     1    bone       qua   Model I
 2     7     1    bone       liu  Model II
 3    20     1    bone       dav Model III
 4     3     1    bone       ema  Model IV
 5     3     1    bone       tho   Model V
 6     4     1    bone      ranc  Model VI
 7     3     2    bone       qua   Model I
 8     5     2    bone       liu  Model II
 9    18     2    bone       dav Model III
10     2     2    bone       ema  Model IV
# ... with 5,978 more rows

And the other:

# A tibble: 954 x 5
    values  ID structure   model   model_f
   <dbl>  <chr>     <chr>   <chr>    <fctr>
 1     9     01    bone       qua   Model I
 2     8     01    bone       liu  Model II
 3    22     01    bone       dav Model III
 4     6     01    bone       ema  Model IV
 5     5     01    bone       tho   Model V
 6     9     01    bone       ran  Model VI
 7    12     02    bone       qua   Model I
 8    11     02    bone       liu  Model II
 9    24     02    bone       dav Model III
10     9     02    bone       ema  Model IV
# ... with 944 more rows

So they are not the same size, the ID's are not the same (data not related), but still, I wish to merge the histograms in order to see the difference between the data.

I thought this might do the trick:

ggplot() +
    geom_histogram(data=data1, aes(x=values), binwidth=1, fill='blue', alpha=0.3, color="black", aes(y=(..count..)*100/(sum(..count..)/6))) +
    geom_histogram(data=data2, aes(x=values), binwidth=1, fill='blue', alpha=0.3, color="black", aes(y=(..count..)*100/(sum(..count..)/6))) +
    facet_wrap(~ model_f, ncol = 3)

However, that didn't do much.

So now I'm stuck. Is this possible to do, or...?

Upvotes: 4

Views: 4741

Answers (1)

Mark Peterson
Mark Peterson

Reputation: 9570

Here is my crack at this, based on the builtin dataset iris (since you did not provide reproducible data). To create the smaller, shifted dataset, I am using dplyr to keep the first 20 rows from each species and add 1 to the Sepal length for each observation:

smallIris <-
  iris %>%
  group_by(Species) %>%
  slice(1:20) %>%
  ungroup() %>%
  mutate(Sepal.Length = Sepal.Length + 1)

Your code at the end gets you close, but you did not specify different colors for the two histograms. If you set the fill differently for each, you will get them to show up differently. You could either set this directly (e.g., change "blue" to "red" in one of them) or by setting a name within aes. Setting it in aes has the advantage of creating (and labeling) a legend:

ggplot() +
  geom_histogram(data=iris
                 , aes(x=Sepal.Length
                       , fill = "Big"
                       , y=(..count..)*100/(sum(..count..)))
                 , alpha=0.3) +
  geom_histogram(data=smallIris
                 , aes(x=Sepal.Length
                       , fill = "Small"
                       , y=(..count..)*100/(sum(..count..)))
                 , alpha=0.3) +
  facet_wrap(~Species)

Creates this:

enter image description here

However, I tend to dislike the look of overlapping histograms, so I would prefer to use a density plot. You can do it just like the above (just change the geom_histogram), but I think you get a bit more control (and the ability to expand this to more than two groups) by stacking the data. Again, this uses dplyr to stitch the two datasets together:

bigIris <-
  bind_rows(
    small = smallIris
    , big = iris
    , .id = "Source"
  )

Then, you can create the plot relatively easily:

bigIris %>%
  ggplot(aes(x = Sepal.Length, col = Source)) +
  geom_line(stat = "density") +
  facet_wrap(~Species)

creates:

enter image description here

Upvotes: 4

Related Questions