ekstroem
ekstroem

Reputation: 6171

Data.table summary statistics from n first observations per group

I'd like to use data.table to make summary statistics based on only the first n observations found for each group. I have one solution that works below but I have a nagging feeling that this might be written as a one-liner in data.table but I cannot find out how.

library(data.table)
DT <- data.table(y=1:10, grp=rep(1:2,5))

This produces

     y grp
 1:  1   1
 2:  2   2
 3:  3   1
 4:  4   2
 5:  5   1
 6:  6   2
 7:  7   1
 8:  8   2
 9:  9   1
10: 10   2

and I basically want to make summary statistics of y based on, say, the first two observations for each group. The following command gives me the index (by group)

DT2 <- DT[, .(idx = 1:.N, y), by=grp]

which yields

    grp idx  y
 1:   1   1  1
 2:   1   2  3
 3:   1   3  5
 4:   1   4  7
 5:   1   5  9
 6:   2   1  2
 7:   2   2  4
 8:   2   3  6
 9:   2   4  8
10:   2   5 10

and then I can use data.table again to create the summary based on the relevant selection.

DT2[idx<3, .(my = mean(y)), by=grp]

to get

   grp my
1:   1  2
2:   2  3

Is it possible to write this as a single call to data.table?

Upvotes: 1

Views: 75

Answers (1)

NGaffney
NGaffney

Reputation: 1532

The one call solution is

DT[, .(my = mean(y[1:2])), by = grp]

Upvotes: 3

Related Questions