Reputation: 5891
I like dplyr's "progress_estimated" function but I can't figure out how to get a progress bar to work inside a dplyr chain. I've put a reproducible example with code at the bottom here.
I have a pretty big data.frame like this:
cdatetime latitude longitude
1 2013-01-11 06:40:00 CST 49.74697 -93.30951
2 2013-01-12 15:55:00 CST 49.74697 -93.30951
3 2013-01-07 20:30:00 CST 49.74697 -93.30951
and I'd like to calculate sunrise times for each date, using the libraries
library(dplyr)
library(StreamMetabolism)
I can get dplyr's progress_estimated bar to work within a loop, e.g.:
Ugly loop (works)
p <- progress_estimated(nrow(test))
for (i in 1:nrow(test)){
p$tick()$print()
datetime = as.POSIXct(substr(test$cdatetime[i], 1, 20), tz = "CST6CDT")
test$sunrise[i] <- sunrise.set(test$latitude[i], test$longitude[i], datetime, "CST6CDT", num.days = 1)[1,1]
}
but how can I nest it in my function, so I can avoid using a loop?
Prefer to use:
SunriseSet <- function(dataframe, timezone){
dataframe %>%
rowwise() %>%
mutate(# calculate the date-time using the correct timezone
datetime = as.POSIXct(substr(cdatetime, 1, 20), tz = timezone),
# Get the time of sunrise and sunset on this day, at the county midpoint
sunrise = sunrise.set(latitude, longitude, datetime, timezone, num.days = 1)[1,1])
}
How to get a progress bar here?
test2 <- SunriseSet(test, "CST6CDT")
Here's some example data:
test <- data.frame(cdatetime = rep("2013-01-11 06:40:00", 300),
latitude = seq(49.74697, 50.04695, 0.001),
longitude = seq(-93.30951, -93.27960, 0.0001))
Upvotes: 20
Views: 7312
Reputation: 846
This is a solution that uses cli::cli_progress_bar inside rowwise(). .env is a dplyr variable that has the current environment (inside rowwise), you need to pass its parent to cli_progress_update.
x <- tibble::tribble(
~a,
1,
2,
3,
4,
5,
6,
7
)
cli::cli_progress_bar("Some progress", total = 7)
y <- x |>
dplyr::rowwise() |>
dplyr::mutate(b = (function(x){
Sys.sleep(50/100)
cli::cli_progress_update(.envir =parent.env(.env))
x
})(a))
Upvotes: 0
Reputation: 13118
Rather than using rowwise()
, perhaps try pairing the map*
functions from purrr
with progress_estimated()
. This answer follows the approach from https://rud.is/b/2017/03/27/all-in-on-r%E2%81%B4-progress-bars-on-first-post/.
First, wrap your function in another function that updates the progress bar:
SunriseSet <- function(lat, long, date, timezone, num.days, .pb = NULL) {
if (.pb$i < .pb$n) .pb$tick()$print()
sunrise.set(lat, long, date, timezone, num.days)
}
Then, iterate through your inputs with pmap
, or pmap_df
(to bind the outputs into a dataframe):
library(purrr)
pb <- progress_estimated(nrow(test), 0)
test2 <- test %>%
mutate(
sunrise = pmap_df(
list(
lat = latitude,
long = longitude,
date = as.character(cdatetime)
),
SunriseSet,
timezone = "CST6CDT", num.days = 1, .pb = pb
)$sunrise
)
Upvotes: 13
Reputation: 689
I dont really like my solution but it works.
print_tick_function <- function(x, p) {
p$tick()$print()
data.frame(x)
}
SunriseSet <- function(dataframe, timezone){
p <- progress_estimated(nrow(dataframe))
dataframe %>%
rowwise() %>%
do(print_tick_function(.,p)) %>%
mutate(
datetime = as.POSIXct(substr(cdatetime, 1, 20), tz = timezone),
sunrise = sunrise.set(latitude, longitude, datetime, timezone, num.days = 1)[1,1]
)
}
test2 <- SunriseSet(test, "CST6CDT")
Upvotes: -1