Reputation: 386
I have a data.table
with a large number of rows. I want to group the data table by one particular column, and I want to apply the same aggregation function to all the other columns. What is the appropriate way of doing that?
Here is some sample code to set up a data table that looks similar to what I have.
my.table.tmp <- matrix(runif(5000*95), nrow=5000)
my.table <- data.table(my.table.tmp)
my.table[, gbc:=rep(c('A', 'B', 'C', 'D', 'E'), 1000)]
I want to group the table by the factor column gbc
, and I want that all the remaining 95 columns should be aggregated by a function, let's say mean
.
I see that
my.table[, lapply(.SD, mean), by=gbc]
gives me a table with the correct dimensions, but I am not sure if this is doing the right thing. If it is doing the right thing, can someone help me by breaking down what's happening here?
Upvotes: 6
Views: 5091
Reputation: 93803
Your description sounds correct .SD
is just all the subsetted columns for each by=
group, and since a data.frame/data.table
is just a list
stuck together as columns, lapply
will loop over each column applying the myfunction
Upvotes: 2