Reputation: 71
I am using dtplyr for speeding operation on a large tibble and I am encountering a problem that I can't figure out. The following is a minimal example.
d <- tibble(
rownnum = c(1L, 2L),
stationID = c(1L, 2L),
groupMemberID = c(0L, 0L),
workCategory = factor(c("I", "A")),
stationName = c("RW", "RW"),
timeSpent = c(period(14060), period(3600)),
time_grouping = c(as.POSIXct("2023-01-03 00:00:00"), as.POSIXct("2023-01-03 00:00:00")),
time_grouping_label = c("2023-01-03", "2023-01-03")
)
d
# A tibble: 2 × 8
rownnum stationID groupMemberID workCategory stationName timeSpent time_grouping time_grouping_label
<int> <int> <int> <fct> <chr> <Period> <dttm> <chr>
1 1 1 0 I RW 14060S 2023-01-03 00:00:00.000000 2023-01-03
2 2 2 0 A RW 3600S 2023-01-03 00:00:00.000000 2023-01-03
d <- d |>
lazy_dt() |>
group_by(stationID, stationName,
time_grouping, time_grouping_label,
workCategory) |>
summarize(cat_time = sum(timeSpent), cat_count = n(),
.groups = "drop_last") |>
mutate(tot_time = sum(cat_time), pct = 100 * cat_time/tot_time,
tot_count = sum(cat_count)) |>
ungroup() |>
mutate(cat_time_period = seconds_to_period(cat_time),
tot_time_period = seconds_to_period(tot_time)) |>
complete(workCategory, nesting(stationID, stationName,
time_grouping, time_grouping_label),
fill = list(cat_time = 0, cat_count = 0, tot_time = 0, pct = 0,
tot_count = 0, cat_time_period = period(0),
tot_time_period = period(0)),
explicit = FALSE) |>
relocate(workCategory, .after = time_grouping_label) |>
arrange(stationID, time_grouping, workCategory) |>
as_tibble()
Error in `common_by.list()` at dplyr/R/join-common-by.R:10:3:
! `by` can't contain join column `V2`, `explicit` which is missing from RHS.
Run `rlang::last_trace()` to see where the error occurred.
Any ideas as to what might be going wrong and how to fix?
I have tried various things like removing the fill spec from complete, replacing with expand etc, to no avail. Appreciate any help.
Upvotes: 0
Views: 52