Avoid repeated geom calls in ggplot function with facet grid and unknown number of groups?

Question

I am attempting to write a density function that will apply a normal curve as a reference for each facet (group). Below, I have attempted to simplify the core issue by avoiding to define the function directly.

Attempt

# Initial setup
library(dplyr)
data <- mtcars
group = "cyl"
variable = "mpg"
gform <- reformulate(".", response=group)
data[[group]] <- as.factor(data[[group]])

# Make data for normal curves
dat_norm <- data %>% group_by(.data[[group]]) %>% 
  summarise(mpg=seq(min(.[[variable]]), 
                  max(.[[variable]]), 
                  length.out=100),
            density=dnorm(seq(min(.[[variable]]), 
                        max(.[[variable]]), 
                        length.out=100), 
                    mean(.[[variable]]), 
                    sd(.[[variable]])))

# Make plot
library(ggplot2)
ggplot(data, aes_string(x=variable, fill=group)) +
  geom_density() +
  geom_line(data=dat_norm, aes_string(x=variable, y="density", group=group), size=1.2) +
  facet_grid(gform)

You can see that the problem here is that it seems like ggplot applies the same data to all facets and do not customize by group. We can do it manually however the problem is that this approach does not allow for an unknown number of groups for the final function.

Desired outcome

# As explained above, the previous figure has the same line for each facet.
# I would like to have the following instead:    
norm.1 <- data %>%
  filter(.[[group]]==levels(.[[group]])[1]) %>%
  with(data.frame(x = seq(min(.[[variable]]), 
                          max(.[[variable]]), 
                          length.out=100), 
                  y = dnorm(seq(min(.[[variable]]), 
                                max(.[[variable]]), 
                                length.out=100), 
                            mean(.[[variable]]), 
                            sd(.[[variable]])))) %>%
  mutate_(cyl = factor(levels(data[[group]])[1],levels = levels(data[[group]])))


norm.2 <- data %>%
  filter(.[[group]]==levels(.[[group]])[2]) %>%
  with(data.frame(x = seq(min(.[[variable]]), 
                          max(.[[variable]]), 
                          length.out=100), 
                  y = dnorm(seq(min(.[[variable]]), 
                                max(.[[variable]]), 
                                length.out=100),
                            mean(.[[variable]]),
                            sd(.[[variable]])))) %>%
  mutate_(cyl = factor(levels(data[[group]])[2],levels = levels(data[[group]])))

norm.3 <- data %>%
  filter(.[[group]]==levels(.[[group]])[3]) %>%
  with(data.frame(x = seq(min(.[[variable]]), 
                          max(.[[variable]]), 
                          length.out=100), 
                  y = dnorm(seq(min(.[[variable]]), 
                                max(.[[variable]]), 
                                length.out=100), 
                            mean(.[[variable]]), 
                            sd(.[[variable]])))) %>%
  mutate_(cyl = factor(levels(data[[group]])[3],levels = levels(data[[group]])))


# Make plot
ggplot(data, aes_string(x=variable, fill=group)) +
  geom_density() +
  facet_grid(gform) +
  geom_line(data = norm.1, aes(x = x, y = y), size=1.2) +
  geom_line(data = norm.2, aes(x = x, y = y), size=1.2) +
  geom_line(data = norm.3, aes(x = x, y = y), size=1.2)

Question

As explained, the latter approach forces me to repeat the geom_line() calls as many times as there are groups. However, within the function, we will not know the number of groups in advance. What would be the solution?

Note: This is a follow-up question to my previous question.

Dan Slone · Accepted Answer

ggplot is behaving correctly. The data frame you are creating (dat_norm) is simply repeating the overall distribution 3 times. One small change to your summarise will make it respect the grouping:


# Initial setup
library(dplyr)
data <- mtcars
group = "cyl"
variable = "mpg"
gform <- reformulate(".", response=group)
data[[group]] <- as.factor(data[[group]])

# Make data for normal curves
dat_norm <- data %>% group_by(.data[[group]]) %>% 
# HERE IS THE CHANGE: do( 
  do(summarise(.,mpg=seq(min(.[[variable]]), 
                    max(.[[variable]]), 
                    length.out=100),
            density=dnorm(seq(min(.[[variable]]), 
                              max(.[[variable]]), 
                              length.out=100), 
                          mean(.[[variable]]), 
                          sd(.[[variable]]))))

# Make plot
library(ggplot2)
ggplot(data, aes_string(x=variable, fill=group)) +
  geom_density() +
  geom_line(data=dat_norm, aes_string(x=variable, y="density", group=group), size=1.2) +
  facet_grid(gform)

Avoid repeated geom calls in ggplot function with facet grid and unknown number of groups?

Attempt

Desired outcome

Question

Answers (1)

Related Questions