Reputation: 2874
Data tables in R have three (main) components: DT[i, j, by]
.
I am creating subsets of my data.table DT using the by
functionality, which returns subsets of my data to j
, where I can perform operations on them. I within each of the new subsets, I can specify the columns I want to use in j
.
From the documentation (slightly altered by me):
DT[, lapply(.SD, mean), by=., .SDcols=...]
- applies fun (=mean) to all columns specified in .SDcols while grouping by the columns specified in by.
This is great functionality!
I would like to know if it is possible to supply arguments to the function being used in j
- in this case: mean
?
The function mean
can take the following inputs:
mean(x, trim = 0, na.rm = FALSE, ...)
How can I use mean
within the j
section AND apply, for example, na.rm = TRUE
?
On a side note, I did have a similar problem regarding the Reduce
function, which applied a functions to a data sets recursiely. The best idea I found was to create a custom version of the function to apply, so something like:
my_mean <- function(Data) {
output <- mean(Data, na.rm = TRUE)
return(output)
}
then using the example above, I would perform:
DT[, lapply(.SD, my_mean), by=., .SDcols=...]
Upvotes: 1
Views: 56
Reputation: 2448
you can pass the extra arguments into lapply:
DT = data.table(x=c(1,2,3,4,NA),y=runif(5),z=c(1,1,1,2,2))
DT[, lapply(.SD, mean,na.rm=T), by=z]
Upvotes: 5