Alex
Alex

Reputation: 19883

aggregating multiple columns in data.table

I have the following sample data.table:

dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))

I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:

 dtb[,colSums, by="id"]

This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name

Upvotes: 17

Views: 11673

Answers (2)

Alex
Alex

Reputation: 19883

this is actually what i was looking for and is mentioned in the FAQ:

dtb[,lapply(.SD,mean),by="id"]

Upvotes: 30

Christoph_J
Christoph_J

Reputation: 6884

I guess in this case is it fastest to bring your data first into the long format and do your aggregation next (see Matthew's comments in this SO post):

library(data.table)
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
library(reshape2)
dt_long <- as.data.table(melt(dtb, id.var="id"))
dt_long[, sum(value), by=c("id","variable")]
    id variable  V1
 1:  1        a 601
 2:  2        a 440
 3:  3        a 496
 4:  4        a 553
 5:  5        a 444
 6:  6        a 466
 7:  7        a 525
 8:  8        a 553
 9:  9        a 541
...

Upvotes: 0

Related Questions