Spartan 117
Spartan 117

Reputation: 129

How to plot multiple distributions with ggplot?

ggplot image

I plotted this trying to understand how to plot the distribution of each singular feature of my dataframe. So, trying to understand if my procedure was correct I implemented this code to plot rapidly two features.

New <- c(Carm[,3],Carm[,4])
Names <-names(Carm)
Label <-c(Names[1],Names[2])
dat <- data.frame(New)

names(dat)[1] <- Label[1]
names(dat)[2] <- Label[2]
dat <- stack(dat) #I built a new smaller db taking only two features
# Now I use ggplot
ggplot(dat, aes(x=values)) + 

geom_histogram(binwidth = 0.5, color = "black",fill ="white")  +

geom_density(aes(group=ind, colour=ind, fill=ind), alpha=0.2)    +

facet_wrap( ~ ind, ncol=2)

So, my question is : why the densities are so small if compared to the histograms? How can I fix it?

Upvotes: 1

Views: 2371

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

To plot a density histogram, it needs to be told not to plot counts. This is done mapping the aesthetic y = ..density... See section Computed variables in help('geom_histogram'). I will use built-in data set iris as the example data set.

library(ggplot2)

ggplot(dat, aes(values)) +
  geom_histogram(aes(y = ..density..), bins = 20, color = "black", fill ="white") +
  geom_density(aes(fill = ind), alpha = 0.2) +
  facet_wrap(~ ind)

enter image description here

Data

library(dplyr)
library(tidyr)

iris[iris$Species == "virginica", 3:4] %>% 
  pivot_longer(everything(), 
               names_to = "ind", 
               values_to = "values") -> dat

Upvotes: 1

This is because while geom_histogram plots counts per bin, while geom density scales the data and represents what proportion of the data is per bin

Upvotes: 0

Related Questions