Reputation: 2597
I'd like to make a function that creates histograms across a number of dimensions and numeric variables.
I'm trying to do it, except that the x-axis takes on the values for whatever the final loop was. So, when the final loop had an x-axis that expanded from 0 to 5000, then all of the histograms would have that same x axis no matter what.
This is my code
groups_df <-
data.frame(
groups =
c(rep("ABC", 3),
rep("XYZ", 2)),
cuts =
c(c("cat", "dog", "bird"),
c("red", "blue")) %>%
mutate(groups = as.character(groups),
cuts = as.character(cuts))
numeric.variables <-
c("hops",
"skips",
"jumps")
visualizations <- list()
for (i in seq_along(groups_df$groups)){
for (j in seq_along(numeric.variables)){
visualizations[[groups_df$groups[i]]][[str_replace_all(tolower(groups_df$cuts[i]), " ", "_")]][[numeric.variables[j]]] <-
mydf %>%
filter(get(groups_df$groups[i]) == list.of.groups_df$cuts[i]) %>%
ggplot(aes(get(numeric.variables[j]))) +
geom_histogram() +
labs(x = numeric.variables[j],
title = paste0(str_replace_all(numeric.variables[j], "_", " "), " - ", tolower(groups_df$cuts[i])))
}
}
Upvotes: 2
Views: 105
Reputation: 46908
The error comes from lazy evaluation inside the for-loop. So your ggplot is only evaluated with the last call.
From my guess, you want to divide the numerical values according to the values in 2 categories (or more). And if you would like to have them in a list, this is one way to do it:
mydf = data.frame(cat_1 = sample(letters[1:3],500,replace=TRUE),
cat_2 = sample(letters[4:5],500,replace=TRUE),
var_1 = rnorm(500),
var_2 = rnorm(500)
)
# categorical columns
cat_columns = c("cat_1","cat_2")
# numeric variables
num_columns = c("var_1","var_2")
plts = lapply(cat_columns,function(col_){
#iterate categories of that column
p2 = lapply(unique(mydf[, col_]), function(cut_){
#iterate values
p3 = lapply(num_columns, function(var_){
thisdf = data.frame(x = mydf[mydf[, col_] == cut_, var_])
return(ggplot(thisdf, aes(x = x)) + geom_histogram())
})
names(p3) = num_columns
return(p3)
})
names(p2) = unique(mydf[, col_])
return(p2)
})
names(plts) = cat_columns
plts[["cat_1"]][["a"]][["var_1"]]
A better way, would be actually to pivot twice and nest them, and code less:
mydf %>%
pivot_longer(-c(var_1,var_2), names_to = "cat", values_to = "cut") %>%
pivot_longer(-c(cat,cut), names_to = "num") %>%
nest(data = c(value))
# A tibble: 10 x 4
cat cut num data
<chr> <fct> <chr> <list>
1 cat_1 a var_1 <tibble [145 x 1]>
2 cat_1 a var_2 <tibble [145 x 1]>
3 cat_2 d var_1 <tibble [257 x 1]>
4 cat_2 d var_2 <tibble [257 x 1]>
5 cat_1 c var_1 <tibble [173 x 1]>
6 cat_1 c var_2 <tibble [173 x 1]>
7 cat_2 e var_1 <tibble [243 x 1]>
8 cat_2 e var_2 <tibble [243 x 1]>
9 cat_1 b var_1 <tibble [182 x 1]>
10 cat_1 b var_2 <tibble [182 x 1]>
Then we can use an lapply to store the plots, if your data is not so huge:
plts = mydf %>%
pivot_longer(-c(var_1,var_2), names_to = "cat", values_to = "cut") %>%
pivot_longer(-c(cat,cut), names_to = "num") %>%
nest(data = c(value)) %>%
mutate(plots = lapply(data, function(i) qplot(i$value)))
plts %>% filter(cat=="cat_1" & num=="var_1" & cut=="a") %>% pull(plots)
Upvotes: 4