Reputation: 1018
I am attempting to write a density function that will apply a normal curve as a reference for each facet (group). Below, I have attempted to simplify the core issue by avoiding to define the function directly.
# Initial setup
library(dplyr)
data <- mtcars
group = "cyl"
variable = "mpg"
gform <- reformulate(".", response=group)
data[[group]] <- as.factor(data[[group]])
# Make data for normal curves
dat_norm <- data %>% group_by(.data[[group]]) %>%
summarise(mpg=seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
density=dnorm(seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
mean(.[[variable]]),
sd(.[[variable]])))
# Make plot
library(ggplot2)
ggplot(data, aes_string(x=variable, fill=group)) +
geom_density() +
geom_line(data=dat_norm, aes_string(x=variable, y="density", group=group), size=1.2) +
facet_grid(gform)
You can see that the problem here is that it seems like ggplot applies the same data to all facets and do not customize by group. We can do it manually however the problem is that this approach does not allow for an unknown number of groups for the final function.
# As explained above, the previous figure has the same line for each facet.
# I would like to have the following instead:
norm.1 <- data %>%
filter(.[[group]]==levels(.[[group]])[1]) %>%
with(data.frame(x = seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
y = dnorm(seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
mean(.[[variable]]),
sd(.[[variable]])))) %>%
mutate_(cyl = factor(levels(data[[group]])[1],levels = levels(data[[group]])))
norm.2 <- data %>%
filter(.[[group]]==levels(.[[group]])[2]) %>%
with(data.frame(x = seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
y = dnorm(seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
mean(.[[variable]]),
sd(.[[variable]])))) %>%
mutate_(cyl = factor(levels(data[[group]])[2],levels = levels(data[[group]])))
norm.3 <- data %>%
filter(.[[group]]==levels(.[[group]])[3]) %>%
with(data.frame(x = seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
y = dnorm(seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
mean(.[[variable]]),
sd(.[[variable]])))) %>%
mutate_(cyl = factor(levels(data[[group]])[3],levels = levels(data[[group]])))
# Make plot
ggplot(data, aes_string(x=variable, fill=group)) +
geom_density() +
facet_grid(gform) +
geom_line(data = norm.1, aes(x = x, y = y), size=1.2) +
geom_line(data = norm.2, aes(x = x, y = y), size=1.2) +
geom_line(data = norm.3, aes(x = x, y = y), size=1.2)
As explained, the latter approach forces me to repeat the geom_line()
calls as many times as there are groups. However, within the function, we will not know the number of groups in advance. What would be the solution?
Note: This is a follow-up question to my previous question.
Upvotes: 1
Views: 120
Reputation: 563
ggplot is behaving correctly. The data frame you are creating (dat_norm) is simply repeating the overall distribution 3 times. One small change to your summarise will make it respect the grouping:
# Initial setup
library(dplyr)
data <- mtcars
group = "cyl"
variable = "mpg"
gform <- reformulate(".", response=group)
data[[group]] <- as.factor(data[[group]])
# Make data for normal curves
dat_norm <- data %>% group_by(.data[[group]]) %>%
# HERE IS THE CHANGE: do(
do(summarise(.,mpg=seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
density=dnorm(seq(min(.[[variable]]),
max(.[[variable]]),
length.out=100),
mean(.[[variable]]),
sd(.[[variable]]))))
# Make plot
library(ggplot2)
ggplot(data, aes_string(x=variable, fill=group)) +
geom_density() +
geom_line(data=dat_norm, aes_string(x=variable, y="density", group=group), size=1.2) +
facet_grid(gform)
Upvotes: 1