Reputation: 19883
I have the following sample data.table
:
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums
, for example. What is the correct way to do this? The following does not work:
dtb[,colSums, by="id"]
This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name
Upvotes: 17
Views: 11673
Reputation: 19883
this is actually what i was looking for and is mentioned in the FAQ:
dtb[,lapply(.SD,mean),by="id"]
Upvotes: 30
Reputation: 6884
I guess in this case is it fastest to bring your data first into the long format and do your aggregation next (see Matthew's comments in this SO post):
library(data.table)
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
library(reshape2)
dt_long <- as.data.table(melt(dtb, id.var="id"))
dt_long[, sum(value), by=c("id","variable")]
id variable V1
1: 1 a 601
2: 2 a 440
3: 3 a 496
4: 4 a 553
5: 5 a 444
6: 6 a 466
7: 7 a 525
8: 8 a 553
9: 9 a 541
...
Upvotes: 0