Reputation: 129
I plotted this trying to understand how to plot the distribution of each singular feature of my dataframe. So, trying to understand if my procedure was correct I implemented this code to plot rapidly two features.
New <- c(Carm[,3],Carm[,4])
Names <-names(Carm)
Label <-c(Names[1],Names[2])
dat <- data.frame(New)
names(dat)[1] <- Label[1]
names(dat)[2] <- Label[2]
dat <- stack(dat) #I built a new smaller db taking only two features
# Now I use ggplot
ggplot(dat, aes(x=values)) +
geom_histogram(binwidth = 0.5, color = "black",fill ="white") +
geom_density(aes(group=ind, colour=ind, fill=ind), alpha=0.2) +
facet_wrap( ~ ind, ncol=2)
So, my question is : why the densities are so small if compared to the histograms? How can I fix it?
Upvotes: 1
Views: 2371
Reputation: 76402
To plot a density histogram, it needs to be told not to plot counts. This is done mapping the aesthetic y = ..density..
. See section Computed variables in help('geom_histogram')
. I will use built-in data set iris
as the example data set.
library(ggplot2)
ggplot(dat, aes(values)) +
geom_histogram(aes(y = ..density..), bins = 20, color = "black", fill ="white") +
geom_density(aes(fill = ind), alpha = 0.2) +
facet_wrap(~ ind)
Data
library(dplyr)
library(tidyr)
iris[iris$Species == "virginica", 3:4] %>%
pivot_longer(everything(),
names_to = "ind",
values_to = "values") -> dat
Upvotes: 1
Reputation: 111
This is because while geom_histogram plots counts per bin, while geom density scales the data and represents what proportion of the data is per bin
Upvotes: 0