Maya
Maya

Reputation: 579

Add geom_vline in function with multiple density plots

I have the following

densityPlots <- lapply(numericCols, function(var_x){
  p <- ggplot(df, aes_string(var_x)) + geom_density() 
  })

numericCols are the names of the columns that are numeric. I want to add the mean line, I have tried multiple things such as

densityPlots <- lapply(numericCols, function(var_x){
  p <- ggplot(df, aes_string(var_x)) + geom_density() + geom_vline(aes(xintercept=mean(var_x)),
                    color="red", linetype="dashed", size=1)
  })

The data

str(df)
tibble [9 × 4] (S3: tbl_df/tbl/data.frame)
 $ A: num [1:9] 12 NA 34 45 56 67 78 89 100
 $ B: num [1:9] 1 2 3 NA 5 6 7 8 9
 $ C: num [1:9] 83 55 27 27 7 3 5 8 9
 $ D: num [1:9] 6 2 NA 1 NA 3 4 5 6

dput(df)
structure(list(A = c(12, NA, 34, 45, 56, 67, 78, 89, 100), B = c(1, 
2, 3, NA, 5, 6, 7, 8, 9), C = c(83, 55, 27, 27, 7, 3, 5, 8, 9
), D = c(6, 2, NA, 1, NA, 3, 4, 5, 6)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))

print(numericCols)
[1] "A" "B" "C"

But it does not work, it just ignores the geom_vline function. Does someone have a suggestion? Thanks :)!

Upvotes: 1

Views: 471

Answers (2)

Marco Sandri
Marco Sandri

Reputation: 24252

You should use mean(df[, var_x], na.rm=T) in geom_vline:

library(ggplot2)

df <- structure(list(A = c(12, NA, 34, 45, 56, 67, 78, 89, 100), B = c(1, 
2, 3, NA, 5, 6, 7, 8, 9), C = c(83, 55, 27, 27, 7, 3, 5, 8, 9
), D = c(6, 2, NA, 1, NA, 3, 4, 5, 6)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))
numericCols <- c("A","B","C")

df <- as.data.frame(df)       
densityPlots <- lapply(numericCols, function(var_x) {
  ggplot(df, aes_string(var_x)) + geom_density() +
  geom_vline(aes(xintercept=mean(df[, var_x], na.rm=T)),
             color="red", linetype="dashed", size=1)
  })

gridExtra::grid.arrange(grobs=densityPlots)

enter image description here

Upvotes: 3

Ian Campbell
Ian Campbell

Reputation: 24790

Here is an approach somewhat different than what you tried in your question, but uses dplyr and tidyr to pivot the data and use ggplot mapping. Unfortunately, geom_vline doesn't summarize by group, so you have to pre-compute the values:

set.seed(3)
data <- data.frame(Category = paste0("Catagory",LETTERS[1:20]),
                   lapply(LETTERS[1:10],function(x){setNames(data.frame(runif(20,10,100)),x)}))
numericCols <- LETTERS[1:10]

library(dplyr)
library(tidyr)
library(ggplot2)
data.means <- data %>% 
  select(numericCols) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "var_x") %>%
  group_by(Variable) %>%
  summarize(Mean = mean(var_x))

data %>% 
  select(numericCols) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "var_x") %>%
ggplot(aes(x = var_x, color = Variable)) +
  geom_density() +
  geom_vline(data = data.means, aes(xintercept=Mean, color = Variable),
                    linetype="dashed", size=1)

enter image description here

Or you could combine with facet_wrap for multiple plots.

data %>% 
  select(numericCols) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "var_x") %>%
ggplot(aes(x = var_x)) +
  facet_wrap(.~Variable) + 
  geom_density() +
  geom_vline(data = data.means, aes(xintercept=Mean, color = Variable),
                    linetype="dashed", size=1)

enter image description here

Upvotes: 2

Related Questions