Reputation: 6171
I'd like to use data.table
to make summary statistics based on only the first n
observations found for each group. I have one solution that works below but I have a nagging feeling that this might be written as a one-liner in data.table
but I cannot find out how.
library(data.table)
DT <- data.table(y=1:10, grp=rep(1:2,5))
This produces
y grp
1: 1 1
2: 2 2
3: 3 1
4: 4 2
5: 5 1
6: 6 2
7: 7 1
8: 8 2
9: 9 1
10: 10 2
and I basically want to make summary statistics of y
based on, say, the first two observations for each group. The following command gives me the index (by group)
DT2 <- DT[, .(idx = 1:.N, y), by=grp]
which yields
grp idx y
1: 1 1 1
2: 1 2 3
3: 1 3 5
4: 1 4 7
5: 1 5 9
6: 2 1 2
7: 2 2 4
8: 2 3 6
9: 2 4 8
10: 2 5 10
and then I can use data.table
again to create the summary based on the relevant selection.
DT2[idx<3, .(my = mean(y)), by=grp]
to get
grp my
1: 1 2
2: 2 3
Is it possible to write this as a single call to data.table
?
Upvotes: 1
Views: 75