Reputation: 2605
So I have two histogram plots I can do one at a time. The result using the following code gives a 2 row x 3 col facet plot for six different histograms:
ggplot(data) +
aes(x=values) +
geom_histogram(binwidth=2, fill='blue', alpha=0.3, color="black", aes(y=(..count..)*100/(sum(..count..)/6))) +
facet_wrap(~ model_f, ncol = 3)
Here the aes(y...)
just gives the percentage instead of counts.
As stated, I have two of this 6 facet_wrap plot, which I now which to combine to show that one is more shifted than the other. In addition, the data size is not the same, so for one I have:
# A tibble: 5,988 x 5
values ID structure model model_f
<dbl> <chr> <chr> <chr> <fctr>
1 6 1 bone qua Model I
2 7 1 bone liu Model II
3 20 1 bone dav Model III
4 3 1 bone ema Model IV
5 3 1 bone tho Model V
6 4 1 bone ranc Model VI
7 3 2 bone qua Model I
8 5 2 bone liu Model II
9 18 2 bone dav Model III
10 2 2 bone ema Model IV
# ... with 5,978 more rows
And the other:
# A tibble: 954 x 5
values ID structure model model_f
<dbl> <chr> <chr> <chr> <fctr>
1 9 01 bone qua Model I
2 8 01 bone liu Model II
3 22 01 bone dav Model III
4 6 01 bone ema Model IV
5 5 01 bone tho Model V
6 9 01 bone ran Model VI
7 12 02 bone qua Model I
8 11 02 bone liu Model II
9 24 02 bone dav Model III
10 9 02 bone ema Model IV
# ... with 944 more rows
So they are not the same size, the ID's are not the same (data not related), but still, I wish to merge the histograms in order to see the difference between the data.
I thought this might do the trick:
ggplot() +
geom_histogram(data=data1, aes(x=values), binwidth=1, fill='blue', alpha=0.3, color="black", aes(y=(..count..)*100/(sum(..count..)/6))) +
geom_histogram(data=data2, aes(x=values), binwidth=1, fill='blue', alpha=0.3, color="black", aes(y=(..count..)*100/(sum(..count..)/6))) +
facet_wrap(~ model_f, ncol = 3)
However, that didn't do much.
So now I'm stuck. Is this possible to do, or...?
Upvotes: 4
Views: 4741
Reputation: 9570
Here is my crack at this, based on the builtin dataset iris
(since you did not provide reproducible data). To create the smaller, shifted dataset, I am using dplyr
to keep the first 20 rows from each species and add 1 to the Sepal length for each observation:
smallIris <-
iris %>%
group_by(Species) %>%
slice(1:20) %>%
ungroup() %>%
mutate(Sepal.Length = Sepal.Length + 1)
Your code at the end gets you close, but you did not specify different colors for the two histograms. If you set the fill
differently for each, you will get them to show up differently. You could either set this directly (e.g., change "blue" to "red" in one of them) or by setting a name within aes
. Setting it in aes
has the advantage of creating (and labeling) a legend:
ggplot() +
geom_histogram(data=iris
, aes(x=Sepal.Length
, fill = "Big"
, y=(..count..)*100/(sum(..count..)))
, alpha=0.3) +
geom_histogram(data=smallIris
, aes(x=Sepal.Length
, fill = "Small"
, y=(..count..)*100/(sum(..count..)))
, alpha=0.3) +
facet_wrap(~Species)
Creates this:
However, I tend to dislike the look of overlapping histograms, so I would prefer to use a density plot. You can do it just like the above (just change the geom_histogram
), but I think you get a bit more control (and the ability to expand this to more than two groups) by stacking the data. Again, this uses dplyr
to stitch the two datasets together:
bigIris <-
bind_rows(
small = smallIris
, big = iris
, .id = "Source"
)
Then, you can create the plot relatively easily:
bigIris %>%
ggplot(aes(x = Sepal.Length, col = Source)) +
geom_line(stat = "density") +
facet_wrap(~Species)
creates:
Upvotes: 4