Cauder
Cauder

Reputation: 2597

How to make visualizations in a for loop

I'd like to make a function that creates histograms across a number of dimensions and numeric variables.

I'm trying to do it, except that the x-axis takes on the values for whatever the final loop was. So, when the final loop had an x-axis that expanded from 0 to 5000, then all of the histograms would have that same x axis no matter what.

This is my code

  groups_df <- 
    data.frame(
      groups = 
          c(rep("ABC", 3),
          rep("XYZ", 2)),
      cuts = 
          c(c("cat", "dog", "bird"), 
            c("red", "blue")) %>%
    mutate(groups = as.character(groups),
           cuts = as.character(cuts))
  
  numeric.variables <-
    c("hops",
      "skips",
      "jumps")
    
  visualizations <- list()
  
  for (i in seq_along(groups_df$groups)){

    for (j in seq_along(numeric.variables)){

      visualizations[[groups_df$groups[i]]][[str_replace_all(tolower(groups_df$cuts[i]), " ", "_")]][[numeric.variables[j]]] <-
        mydf %>% 
        filter(get(groups_df$groups[i]) == list.of.groups_df$cuts[i]) %>% 
        ggplot(aes(get(numeric.variables[j]))) +
        geom_histogram() + 
        labs(x = numeric.variables[j],
             title = paste0(str_replace_all(numeric.variables[j], "_", " "), " - ", tolower(groups_df$cuts[i])))
        
    }
  }
  

Upvotes: 2

Views: 105

Answers (1)

StupidWolf
StupidWolf

Reputation: 46908

The error comes from lazy evaluation inside the for-loop. So your ggplot is only evaluated with the last call.

From my guess, you want to divide the numerical values according to the values in 2 categories (or more). And if you would like to have them in a list, this is one way to do it:

mydf = data.frame(cat_1 = sample(letters[1:3],500,replace=TRUE),
                  cat_2 = sample(letters[4:5],500,replace=TRUE),
                  var_1 = rnorm(500),
                  var_2 = rnorm(500)
)

# categorical columns
cat_columns = c("cat_1","cat_2")
# numeric variables
num_columns = c("var_1","var_2")
 
plts = lapply(cat_columns,function(col_){
  #iterate categories of that column
  p2 = lapply(unique(mydf[, col_]), function(cut_){
    #iterate values
    p3 = lapply(num_columns, function(var_){
      thisdf = data.frame(x = mydf[mydf[, col_] == cut_, var_])
      return(ggplot(thisdf, aes(x = x)) + geom_histogram())
      
    })
    names(p3) = num_columns
    return(p3)
  })
  names(p2) =  unique(mydf[, col_])
  return(p2)
  
})

names(plts) = cat_columns

plts[["cat_1"]][["a"]][["var_1"]]

enter image description here

A better way, would be actually to pivot twice and nest them, and code less:

mydf %>% 
  pivot_longer(-c(var_1,var_2), names_to = "cat", values_to = "cut") %>% 
  pivot_longer(-c(cat,cut), names_to = "num") %>% 
  nest(data = c(value))

# A tibble: 10 x 4
   cat   cut   num   data              
   <chr> <fct> <chr> <list>            
 1 cat_1 a     var_1 <tibble [145 x 1]>
 2 cat_1 a     var_2 <tibble [145 x 1]>
 3 cat_2 d     var_1 <tibble [257 x 1]>
 4 cat_2 d     var_2 <tibble [257 x 1]>
 5 cat_1 c     var_1 <tibble [173 x 1]>
 6 cat_1 c     var_2 <tibble [173 x 1]>
 7 cat_2 e     var_1 <tibble [243 x 1]>
 8 cat_2 e     var_2 <tibble [243 x 1]>
 9 cat_1 b     var_1 <tibble [182 x 1]>
10 cat_1 b     var_2 <tibble [182 x 1]>

Then we can use an lapply to store the plots, if your data is not so huge:

plts = mydf %>% 
  pivot_longer(-c(var_1,var_2), names_to = "cat", values_to = "cut") %>% 
  pivot_longer(-c(cat,cut), names_to = "num") %>% 
  nest(data = c(value)) %>%
  mutate(plots = lapply(data, function(i) qplot(i$value)))


plts %>% filter(cat=="cat_1" & num=="var_1" & cut=="a") %>% pull(plots)

enter image description here

Upvotes: 4

Related Questions