dfrankow
dfrankow

Reputation: 21447

How to add a progress bar in a dplyr group_by and summarize?

Here is some code:

library(dplyr)
foo <- data.frame(a=runif(1000))
foo %>% group_by(a1=round(a, 1)) %>% summarize(num=n())

How can I get a progress bar on the group_by and/or summarize?

Note this example is simplified. The progress bar is more useful when the group_by and summarize are more expensive, so I can tell if it's going to complete in one minute, one hour, one day, or worse.

I see this question that talks about using rowwise, but I don't want rowwise. I see the deprecated progress_estimated, and the progress package it refers to, but it's not obvious to me how to modify the example above.

Upvotes: 1

Views: 970

Answers (1)

Sinh Nguyen
Sinh Nguyen

Reputation: 4497

I recommend you have a mechanism to split up data and save each group results to disk when done then combined them once all group are done. Otherwise you risk losing the whole calculation progress due to unplanned accident that interupted R while everything is still in RAM.

Here is a sample solution to add progress bar with group_split

library(dplyr)
library(tidyr)
library(purrr)
library(progress)

set.seed(100)
sample_data <- tibble(groups = rep(letters[1:10], 20),
                      number = runif(200, min = 0, max = 100))

# a summary function to process for each group and update progress bar
summary_fn = function(group_df) {
  # add sleep to simulate long calculation otherwise it would finish in no time
  Sys.sleep(runif(1, min = 0, max = 5))
  pb$tick()
  group_df %>%
    mutate(number = number + 5)
}

# create progress bar
pb <- progress_bar$new(total = 10)

splitted_data <- sample_data %>%
  # split data into list of group that will be map to summary_fn
  group_split(groups) %>%
  # map_dfr will process each group separately and as summary_fn
  # update the progress bar for each run you will see the process
  map_dfr(.f = summary_fn)

Created on 2022-07-21 by the reprex package (v2.0.1)

Here is a blurry GIF of what progress bar look like when running above code.

enter image description here

Upvotes: 2

Related Questions