Reputation: 6436
Given the data table:
DT <- data.table(C1=1:6,C2=10:15,C3=1:3,C4=c("A","B"))
C1 C2 C3 C4
1: 1 10 1 A
2: 2 11 2 B
3: 3 12 3 A
4: 4 13 1 B
5: 5 14 2 A
6: 6 15 3 B
If I want to group by C4, compute the mean on C3 and display the first element in C1, I would do this:
DT[, list(C1=C1[1], AVG_C3=mean(C3), Freq=.N), by=C4]
C4 C1 AVG_C3 Freq
1: A 1 2 3
2: B 2 2 3
If I would like to get the first element in C2, then I add C2[1]
in the above line:
DT[, list(C1=C1[1], C2=C2[1], AVG_C3=mean(C3), Freq=.N), by=C4]
How would I tackle this task given the fact that I have hundreds of columns and I wish to see the first element as explained above?
Maybe there is some simple way like working with a vector of desired columns cols <- c("C1","C2")
, but I can't figure it out how to assign to cols
within DT[]
. I had a look at assigning to multiple columns here Assign multiple columns using := in data.table, by group , but I'm not sure how to implement it for this case. I would appreciate your feedback :)
Upvotes: 2
Views: 842
Reputation: 887213
We can do the summary statistics on specified columns by 'C4' and join on
the original dataset to get the corresponding rows.
DT[DT[, .(Avg_C3= mean(C3), freq= .N, C1 = C1[1L]) , by = C4], on = c("C4", "C1")]
# C1 C2 C3 C4 Avg_C3 freq
#1: 1 10 1 A 2 3
#2: 2 11 2 B 2 3
Or an option using dplyr
library(dplyr)
DT %>%
group_by(C4) %>%
mutate(Avg_C3 = mean(C3), freq = n(), C1 = first(C1)) %>%
slice(1L)
# C1 C2 C3 C4 Avg_C3 freq
# <int> <int> <int> <chr> <dbl> <int>
#1 1 10 1 A 2 3
#2 2 11 2 B 2 3
Upvotes: 2
Reputation: 214977
Suppose you want to see the first element of C1
, C2
and C3
respectively, you can use the head
on the .SD
and specify the column names using .SDcols
.
cols <- c("C1", "C2", "C3")
DT[, c(head(.SD, 1), list(AVG_C3=mean(C3), Freq=.N)), by=C4, .SDcols = cols]
C4 C1 C2 C3 AVG_C3 Freq
1: A 1 10 1 2 3
2: B 2 11 2 2 3
Upvotes: 4