Svetlana Ko
Svetlana Ko

Reputation: 35

aggregate data table in r

I have a large data table which need to be aggregated by one variable (ID). Variable Vb should be aggregated as a sum, but variable Vc should just keep it's value since it has the same value for each ID(similar to aggregation by first value in SPSS).

DT <- data.table(ID = c(11, 11, 22, 22, 22, 44, 55, 55, 55), 
    Vb=c(50,40,4,3,2,8,9,11,5), Vc = c(1,1,3,3,3,1,2,2,2))

I have approximately 15 variables to aggregate (half of them by sum, others by value), so the most efficient way would be appreciated!

Upvotes: 0

Views: 2617

Answers (2)

Saurabh Chauhan
Saurabh Chauhan

Reputation: 3211

Using sqldf:

We can group by ID and sum(Vb) as below:

library(sqldf)
sqldf("select ID,sum(Vb),VC from DT group by ID") # If Vc is unique

OR

sqldf("select ID,sum(Vb),VC from DT group by ID,Vc") # If Vc is not unique

Output:

  ID sum(Vb) Vc
1 11      90  1
2 22       9  3
3 44       8  1
4 55      25  2

Upvotes: 1

bobbel
bobbel

Reputation: 2031

This should work (if Vc is truly unique):

DT[, .(Vb=sum(Vb), Vc=unique(Vc)), by=ID]

Upvotes: 2

Related Questions