Rich Pauloo
Rich Pauloo

Reputation: 8412

Plot normal distribution onto data

I have some lognormal data that I want to transform, then fit a normal distribution to. Here is a reproducible example with data that's not exactly lognormal, but it's close enough:

# create some lognormal data
dat <- data.frame(x = c(runif(1000, 0, 1), runif(300, 1, 100), runif(100,100,1000)))

I can transform these data by log10 and fit a normal distribution to them with MASS::fitdistr.

# fit normal distribution to log-transformed values
library(MASS)
fit <- fitdistr(log10(dat$x), densfun = "normal")
fit$estimate # mean and SD of the normal distribution

Now I want to plot the data, and the distribution on top of it. I transform my data by log10, and plot the normal distribution with stat_function, but it's not fitting the data.

# plot data and distribution
ggplot(data = dat) +
  geom_histogram(mapping = aes(x = log10(x))) +
  stat_function(fun = dnorm, args = list(mean = fit$estimate[1], sd = fit$estimate[2], log = TRUE))

enter image description here

Any pointers, as well as verification if I'm going about this correctly, would be very helpful.

Lastly to get my x axis in displaying units of 10, 100, 1000, etc... should I use scale_x_log10()? What's a simple way to format the x axis?

Upvotes: 1

Views: 2952

Answers (1)

Marco Sandri
Marco Sandri

Reputation: 24262

When you want to draw on the same graph a histogram and a density distribution you need to plot a density histogram using the aesthetic y=..density..
Here is an example. To make things clear, I generated data from a lognormal distribution.

set.seed(123)
# Generate data from a log-normal distribution
dat <- data.frame(x=rlnorm(10000))

# Fit normal distribution to log-transformed values
library(MASS)
fit <- fitdistr(log10(dat$x), densfun = "normal")

# Plot density histogram and fitted distribution
ggplot(data = dat) +
  geom_histogram(mapping = aes(x = log10(x), y = ..density..), col="white") +
  stat_function(fun = dnorm, 
      args = list(mean = fit$estimate[1], sd = fit$estimate[2], log = F), 
      color="red", lwd=1)

enter image description here

Upvotes: 3

Related Questions