Reputation: 35
I have a large data table which need to be aggregated by one variable (ID). Variable Vb should be aggregated as a sum, but variable Vc should just keep it's value since it has the same value for each ID(similar to aggregation by first value in SPSS).
DT <- data.table(ID = c(11, 11, 22, 22, 22, 44, 55, 55, 55),
Vb=c(50,40,4,3,2,8,9,11,5), Vc = c(1,1,3,3,3,1,2,2,2))
I have approximately 15 variables to aggregate (half of them by sum, others by value), so the most efficient way would be appreciated!
Upvotes: 0
Views: 2617
Reputation: 3211
Using sqldf
:
We can group by ID
and sum(Vb)
as below:
library(sqldf)
sqldf("select ID,sum(Vb),VC from DT group by ID") # If Vc is unique
OR
sqldf("select ID,sum(Vb),VC from DT group by ID,Vc") # If Vc is not unique
Output:
ID sum(Vb) Vc
1 11 90 1
2 22 9 3
3 44 8 1
4 55 25 2
Upvotes: 1
Reputation: 2031
This should work (if Vc is truly unique):
DT[, .(Vb=sum(Vb), Vc=unique(Vc)), by=ID]
Upvotes: 2