ie86
ie86

Reputation: 157

Change count to probability in this histogram

I'm struggling to change the count to probability in the following histogram without messing the red area. Also, how to align the 1,10 with the rest of the numbers on the x-axis?

library(dplyr)
library(tibble)
library(ggplot2)
nrep = 10000
scientific <- function(x){
  ifelse(x==1e+0, "1", ifelse(x==1e+1,"10",parse(text=gsub("[+]", "", gsub("1e+", "10^", scales::scientific_format()(x))))))
}
bw <- 0.05
mx=rf(nrep,5,2)
df = tibble(x = mx)
ggplot(df,aes(x)) + 
geom_histogram(binwidth=bw, color="white", fill = "#1380A1") + 
geom_histogram(data=df %>% filter(x < 10^(-1) + 1.15*bw), binwidth=bw, color="white", fill = "red") +
geom_density(aes(y = bw*after_stat(count)), color="blue") +
scale_x_continuous(trans="log10", breaks = 10^seq(-1, 5, by = 1), labels = scientific)

enter image description here

Upvotes: 3

Views: 1644

Answers (1)

You need to change the scale from count to density in the layers geom_histogram. In the first histogram, you can do it using after_stat(density) which is equivalent to after_stat(count/sum(count))/bw. However, the same procedure does not work in the second histogram because sum(count) is different when you subset a dataset. If you do it, the second histogram will be on a different scale.

library(dplyr)
library(tibble)
library(ggplot2)
nrep = 10000
scientific <- function(x){
  ifelse(x==1e+0, "1", ifelse(x==1e+1,"10",parse(text=gsub("[+]", "", gsub("1e+", "10^", scales::scientific_format()(x))))))
}
bw <- 0.05
mx=rf(nrep,5,2)

df = tibble(x = mx) 
pdf <- df %>% filter(x < 10^(-1) + 1.15*bw)
ggplot() + 
  geom_histogram(data = df,
                 aes(x = x, y = after_stat(count/sum(count)/bw),
                 binwidth=bw, color="white", fill = "#1380A1") + 
  geom_histogram(data = pdf, 
                 aes(x = x, y = after_stat(count/sum(count)/bw),
                 binwidth=bw, color="white", fill = "red") +
  geom_density(data = df, 
               aes(x = x), color="blue") +
  scale_x_continuous(trans="log10", breaks = 10^seq(-1, 5, by = 1), 
                     labels = scientific)

enter image description here

Therefore, you need to calculate the density with the same denominator from the first histogram, which is defined as nrep.

library(dplyr)
library(tibble)
library(ggplot2)
nrep = 10000
scientific <- function(x){
  ifelse(x==1e+0, "1", ifelse(x==1e+1,"10",parse(text=gsub("[+]", "", gsub("1e+", "10^", scales::scientific_format()(x))))))
}
bw <- 0.05
mx=rf(nrep,5,2)

df = tibble(x = mx) 
pdf <- df %>% filter(x < 10^(-1) + 1.15*bw)
ggplot() + 
  geom_histogram(data = df,
                 aes(x = x, y = after_stat(density)),
                 binwidth=bw, color="white", fill = "#1380A1") + 
  geom_histogram(data = pdf, 
                 aes(x = x, y = after_stat(count/nrep)/bw),
                 binwidth=bw, color="white", fill = "red") +
  geom_density(data = df, 
               aes(x = x), color="blue") +
  scale_x_continuous(trans="log10", breaks = 10^seq(-1, 5, by = 1), 
                     labels = scientific)

enter image description here

Upvotes: 3

Related Questions