n1k31t4
n1k31t4

Reputation: 2874

Supplying function arguments for operations performed on data.table subsets created by `by`

Data tables in R have three (main) components: DT[i, j, by].

I am creating subsets of my data.table DT using the by functionality, which returns subsets of my data to j, where I can perform operations on them. I within each of the new subsets, I can specify the columns I want to use in j.

From the documentation (slightly altered by me):

DT[, lapply(.SD, mean), by=., .SDcols=...] - applies fun (=mean) to all columns specified in .SDcols while grouping by the columns specified in by.

This is great functionality!

I would like to know if it is possible to supply arguments to the function being used in j - in this case: mean?

The function mean can take the following inputs:

mean(x, trim = 0, na.rm = FALSE, ...)

How can I use mean within the j section AND apply, for example, na.rm = TRUE?


On a side note, I did have a similar problem regarding the Reduce function, which applied a functions to a data sets recursiely. The best idea I found was to create a custom version of the function to apply, so something like:

my_mean <- function(Data) {

    output <- mean(Data, na.rm = TRUE)

    return(output)
}

then using the example above, I would perform:

DT[, lapply(.SD, my_mean), by=., .SDcols=...]

Upvotes: 1

Views: 56

Answers (1)

Pekka
Pekka

Reputation: 2448

you can pass the extra arguments into lapply:

DT = data.table(x=c(1,2,3,4,NA),y=runif(5),z=c(1,1,1,2,2))
DT[, lapply(.SD, mean,na.rm=T), by=z]

Upvotes: 5

Related Questions