marine-ecologist
marine-ecologist

Reputation: 182

dtplyr throws an 'invalid 'type' (closure) of argument' using group_by() %>% sample_n() in dplyr and [[sample()], by=id] in data.table

I've been using dtplyr to speed up an overly complex dplyr code, and so far it's been excellent, apart from one issue I can't seem to resolve.

The problem is pretty straight forward to solve in both dplyr and data.table, but I can't see a way of applying it to a dtplyr_step object from lazy_dt() without using collect() or converting it back to a data.frame.

I'm trying to group a dataframe by one column, and sample rows n times based on values in another column.
Here's a working example in dplyr:

library(dplyr)

df <- data.frame(id=c("a","a","a","b","b","b","c","c","c","d","d","d"), 
                 count=sample(1:25, 12, replace=TRUE))

df %>% group_by(id) %>% sample_n(max(count), replace = TRUE)

and in data.table:

library(data.table)

dt <- data.table(id=c("a","a","a","b","b","b","c","c","c","d","d","d"), 
                 count=sample(1:25, 12, replace=TRUE))

dt[,.SD[sample(.N, max(count,.N), replace=TRUE)],by = id]

However, attempting both approaches used on an identical "lazy" data.table created with lazy_dt() from the dtplyr package:

library(dtplyr)

df2 <- lazy_dt(df)

df2 %>% group_by(id) %>% sample_n(max(count), replace = TRUE)

fails with Error in max(count) : invalid 'type' (closure) of argument

df2[,.SD[sample(.N, max(count,.N), replace=TRUE)],by = id]

fails with Error in max(count, .N) : invalid 'type' (closure) of argument

Presumably because the count column is no longer recognised as numeric.

Is there a way of doing this in dtplyr without converting this back to a data.frame or data.table (other than recoding the original dplyr code to data.table entirely?)

Upvotes: 0

Views: 149

Answers (0)

Related Questions