Jordan
Jordan

Reputation: 1495

Loop through aggregation using dplyr

So I'm I've tried to find the answer to this probably-obvious question. I have multiple predictor variables that I need to loop through in order to get a summary of another column for each predictor. This data frame will change with every iteration so I need code that work for multiple different data frames. Here are the places I've looked so far:

R- producing a summary calculation for each column that is dependent on aggregations at a factor level

Multiple data frame handling

Using the mtcars package, this is what I've tried:

#get mtcars data from graphics package
install.packages("graphics")
library(graphics)
data <- mtcars 

#loop through names
variable <- list(colnames(data))
for(i in variable){
data1 <- data %>%
  group_by(i)
  summarise('number' = mean(mpg))
  }

However, I get the following error:

 Error in grouped_df_impl(data, unname(vars), drop) : 
 Column `i` is unknown

Not sure where to go next.

Upvotes: 1

Views: 1992

Answers (1)

akrun
akrun

Reputation: 887098

There are multiple issues in the code,

1) the variable is unnecessarily created as a list

2) Looping through the 'variable' is not getting inside the list, which is an issue from 1.

3) group_by_at can be used in place of group_by for string inputs

4) there is a typo of no connection ie. chain (%>%) between group_by and summarise

5) the output should be stored in a list or else it will be overwritten as we are assigning to the same object 'data1'


The below code does the correction

variable <- colnames(data) #is a `vector` now
data1 <- list() # initialize as a `list`
for(i in variable){ 
 data1[[i]] <- data %>%
     group_by_at(i) %>% #changed to `group_by_at`
   summarise(number = mean(mpg))
 } 

Or this can be done in a tidyverse syntax which will return the output as a list of tibble and to avoid the initialization of list and assignment

purrr::map(variable, ~ data %>%
                          group_by_at(.x) %>%
                          summarise(number = mean(mpg))) 

If we need to bind the list elements use bind_rows. But, it would also create multiple columns as the first column name is different and fill with NA

purrr::map(variable, ~ data %>%
                      group_by_at(.x) %>%
                      summarise(number = mean(mpg))) %>%
                      set_names(variable) %>%
                      bind_rows(., .id = 'variable')

Upvotes: 2

Related Questions