Reputation: 21447
Here is some code:
library(dplyr)
foo <- data.frame(a=runif(1000))
foo %>% group_by(a1=round(a, 1)) %>% summarize(num=n())
How can I get a progress bar on the group_by
and/or summarize
?
Note this example is simplified. The progress bar is more useful when the group_by and summarize are more expensive, so I can tell if it's going to complete in one minute, one hour, one day, or worse.
I see this question that talks about using rowwise
, but I don't want rowwise
. I see the deprecated progress_estimated, and the progress package it refers to, but it's not obvious to me how to modify the example above.
Upvotes: 1
Views: 970
Reputation: 4497
I recommend you have a mechanism to split up data and save each group results to disk when done then combined them once all group are done. Otherwise you risk losing the whole calculation progress due to unplanned accident that interupted R while everything is still in RAM.
Here is a sample solution to add progress bar with group_split
library(dplyr)
library(tidyr)
library(purrr)
library(progress)
set.seed(100)
sample_data <- tibble(groups = rep(letters[1:10], 20),
number = runif(200, min = 0, max = 100))
# a summary function to process for each group and update progress bar
summary_fn = function(group_df) {
# add sleep to simulate long calculation otherwise it would finish in no time
Sys.sleep(runif(1, min = 0, max = 5))
pb$tick()
group_df %>%
mutate(number = number + 5)
}
# create progress bar
pb <- progress_bar$new(total = 10)
splitted_data <- sample_data %>%
# split data into list of group that will be map to summary_fn
group_split(groups) %>%
# map_dfr will process each group separately and as summary_fn
# update the progress bar for each run you will see the process
map_dfr(.f = summary_fn)
Created on 2022-07-21 by the reprex package (v2.0.1)
Here is a blurry GIF of what progress bar look like when running above code.
Upvotes: 2