mbarete
mbarete

Reputation: 449

data.table index of subset

Working with data.table package in R, I'm trying to get the 'group number' of some data points. Specifically, my data is trajectories: I have many rows describing a specific observation of the particle I'm tracking, and I want to generate a specific index for the trajectory based on other identifying information I have. If I do a [, , by] command, I can group my data by this identifying information and isolate each trajectory. Is there a way, similar to .I or .N, which gives what I would call the index of the subset?

Here's an example with toy data:

dt <- data.table(x1 = c(rep(1,4), rep(2,4)),
x2 = c(1,1,2,2,1,1,2,2),
z = runif(8))

I need a fast way to get the trajectories (here should be c(1,1,2,2,3,3,4,4) for each observation -- my real data set is moderately large.

Upvotes: 3

Views: 88

Answers (1)

akrun
akrun

Reputation: 886968

If we need the trajectories (donno what that means) based on the 'x2', we can use rleid

dt[, Grp := rleid(x2)]

Or if we need the group numbers based on 'x1' and 'x2', .GRP can be used.

dt[,  Grp := .GRP,.(x1, x2)]

Or this can be done using rleid alone without the by (as @Frank mentioned)

dt[, Grp := rleid(x1,x2)]

Upvotes: 3

Related Questions