Group-by in data.table with choosing first element in multiple columns

Question

Given the data table:

DT <- data.table(C1=1:6,C2=10:15,C3=1:3,C4=c("A","B")) 

   C1 C2 C3 C4
1:  1 10  1  A
2:  2 11  2  B
3:  3 12  3  A
4:  4 13  1  B
5:  5 14  2  A
6:  6 15  3  B

If I want to group by C4, compute the mean on C3 and display the first element in C1, I would do this:

DT[, list(C1=C1[1], AVG_C3=mean(C3), Freq=.N), by=C4]
   C4 C1 AVG_C3 Freq
1:  A  1      2    3
2:  B  2      2    3

If I would like to get the first element in C2, then I add C2[1] in the above line:

DT[, list(C1=C1[1], C2=C2[1], AVG_C3=mean(C3), Freq=.N), by=C4]

How would I tackle this task given the fact that I have hundreds of columns and I wish to see the first element as explained above?

Maybe there is some simple way like working with a vector of desired columns cols <- c("C1","C2"), but I can't figure it out how to assign to cols within DT[]. I had a look at assigning to multiple columns here Assign multiple columns using := in data.table, by group , but I'm not sure how to implement it for this case. I would appreciate your feedback :)

akuiper · Accepted Answer

Suppose you want to see the first element of C1, C2 and C3 respectively, you can use the head on the .SD and specify the column names using .SDcols.

cols <- c("C1", "C2", "C3")
DT[, c(head(.SD, 1), list(AVG_C3=mean(C3), Freq=.N)), by=C4, .SDcols = cols]

   C4 C1 C2 C3 AVG_C3 Freq
1:  A  1 10  1      2    3
2:  B  2 11  2      2    3

Group-by in data.table with choosing first element in multiple columns

Answers (2)

Related Questions