Applying a function to all columns of a data.table together with a group-by

Question

I have a data.table with a large number of rows. I want to group the data table by one particular column, and I want to apply the same aggregation function to all the other columns. What is the appropriate way of doing that?

Here is some sample code to set up a data table that looks similar to what I have.

my.table.tmp <- matrix(runif(5000*95), nrow=5000)
my.table <- data.table(my.table.tmp)
my.table[, gbc:=rep(c('A', 'B', 'C', 'D', 'E'), 1000)]

I want to group the table by the factor column gbc, and I want that all the remaining 95 columns should be aggregated by a function, let's say mean.

I see that

my.table[, lapply(.SD, mean), by=gbc]

gives me a table with the correct dimensions, but I am not sure if this is doing the right thing. If it is doing the right thing, can someone help me by breaking down what's happening here?

thelatemail · Accepted Answer

Your description sounds correct .SD is just all the subsetted columns for each by= group, and since a data.frame/data.table is just a list stuck together as columns, lapply will loop over each column applying the myfunction

Applying a function to all columns of a data.table together with a group-by

Answers (1)

Related Questions