dplyr to data.table for speed up execution time

Question

I am currently dealing with a moderately large dataframe called d.mkt (> 2M rows and 12 columns). As dplyr is too slow when applying summarise() function combined with group_by_at, I am trying to write an equivalent statement using data.table to speed up the summarise computation part of dplyr. However, the situation is quite special in the case that the original dataframe is group_by_at and then summarising over the same set of columns (e.g. X %>% select(-id) %>% group_by_at(vars(-x,-y,-z,-t) %>% summarise(x = sum(x), y = sum(y), z = sum(z), y = sum(t)) %>% ungroup()).

With that in mind, below is my current attempt, which kept failing to work because of this error: keyby or by has length (1,1,1,1). Could someone please help let me know how to fix this error?

dplyr's code

    d.mkt <- d.mkt %>%
  left_join(codes, by = c('rte_cd', 'cd')) %>%
  mutate(is_valid = replace_na(is_valid, FALSE),
         rte_cd = ifelse(is_valid, rte_cd, 'RC'),
         rte_dsc = ifelse(is_valid, rte_dsc, 'SKIPPED')) %>%
  select(-is_valid) %>%
  group_by_at(vars(-c_rv, -g_rv, -h_rv, -rn)) %>%
  summarise(c_rv = sum(as.numeric(c_rv)), g_rv = sum(as.numeric(g_rv)), h_rv = sum(as.numeric(h_rv)), rn = sum(as.numeric(rn))) %>%
  ungroup()

My attempt for translating the above

  d.mkt <- as.data.table(d.mkt)
  d.mkt <- d.mkt[codes, on = c('rte_cd', 'sb_cd'),
                        `:=` (is.valid = replace_na(is_valid, FALSE), rte_cd = ifelse(is_valid, rte_cd, 'RC00'), 
                              rte_ds = ifelse(is_valid, rte_ds, 'SKIPPED'))]
  d.mkt <- d.mkt[, -"is.valid", with=FALSE]
  d.mkt <- d.mkt[, .(c_rv=sum(c_rv), g_rv=sum(g_rv), h_rv = sum(h_rv), rn = sum(rn)), by = .('prop', 'date')] --- Error here already, but how do we ungroup a `data.table` though?

dplyr to data.table for speed up execution time

Answers (1)

Related Questions