Group data.table by continuity of variable

Question

I want to separate a data.table into groups based on the continuity of one variable. So to speak, from this data.table:

DT <- data.table(Var1 = c(1:5, 7:10))

I want it to be grouped like this:

#    Var1 group
# 1:    1     1 # 1 to 5 is continuous with a maximal difference of 1
# 2:    2     1
# 3:    3     1
# 4:    4     1
# 5:    5     1
# 6:    7     2 # 6 to 10 is continuous again
# 7:    8     2
# 8:    9     2
# 9:   10     2

The difference of Var1 should not be limited to one like in this minimal example, but be adjustable, so that DT <- data.table(Var1 = c(seq(1,10, 2), seq(13,30, 2))) will also be separated into two groups when given a maximal difference of 2.

EDIT: I should clarify that a 'maximal difference' of 2 or more is meant in a way that differences in Var1 smaller than two should be treated as 'continuous'. Furthermore the variable Var1 should not be limited to integer values. The last thing could be avoided by multiplying e.g. 0.14 by 100 to get 14 and also multiplying 'maximal difference' by 100.

Roland · Accepted Answer

DT[, group := rleid(cumprod(c(1, diff(Var1))))]
#   Var1 group
#1:    1     1
#2:    2     1
#3:    3     1
#4:    4     1
#5:    5     1
#6:    7     2
#7:    8     2
#8:    9     2
#9:   10     2

step <- 2
DT <- data.table(Var1 = c(seq(1,10, 2), seq(13,30, 2)))
DT[, group := rleid(cumsum(c(FALSE, diff(Var1) != step)))]
#    Var1 group
# 1:    1     1
# 2:    3     1
# 3:    5     1
# 4:    7     1
# 5:    9     1
# 6:   13     2
# 7:   15     2
# 8:   17     2
# 9:   19     2
#10:   21     2
#11:   23     2
#12:   25     2
#13:   27     2
#14:   29     2

Group data.table by continuity of variable

Answers (2)

Related Questions