Reputation: 1327
I want to separate a data.table
into groups based on the continuity of one variable.
So to speak, from this data.table
:
DT <- data.table(Var1 = c(1:5, 7:10))
I want it to be grouped like this:
# Var1 group
# 1: 1 1 # 1 to 5 is continuous with a maximal difference of 1
# 2: 2 1
# 3: 3 1
# 4: 4 1
# 5: 5 1
# 6: 7 2 # 6 to 10 is continuous again
# 7: 8 2
# 8: 9 2
# 9: 10 2
The difference of Var1
should not be limited to one like in this minimal example, but be adjustable, so that DT <- data.table(Var1 = c(seq(1,10, 2), seq(13,30, 2)))
will also be separated into two groups when given a maximal difference of 2.
EDIT:
I should clarify that a 'maximal difference' of 2 or more is meant in a way that differences in Var1
smaller than two should be treated as 'continuous'. Furthermore the variable Var1
should not be limited to integer values. The last thing could be avoided by multiplying e.g. 0.14 by 100 to get 14 and also multiplying 'maximal difference' by 100.
Upvotes: 1
Views: 134
Reputation: 17648
A base R solution.
foo <- function(x){
gr <- which(!(duplicated(diff(x)) | duplicated(diff(x), fromLast = T)))
if(length(gr) == 1){
cbind(Var1=x,group=rep(1:(length(gr)+1), c(min(gr),length(x)-max(gr))))
}else{
cbind(Var1=x,group=rep(1:(length(gr)+1), c(min(gr), diff(gr),length(x)-max(gr))))
}
}
All kind of differences are working.
foo(c(seq(1,10, 2), seq(13,30, 2)))
Var1 group
[1,] 1 1
[2,] 3 1
[3,] 5 1
[4,] 7 1
[5,] 9 1
[6,] 13 2
[7,] 15 2
[8,] 17 2
[9,] 19 2
[10,] 21 2
[11,] 23 2
[12,] 25 2
[13,] 27 2
[14,] 29 2
Three groups are working as well.
foo(c(1:5, 7:10, 13:20))
Var1 group
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 7 2
[7,] 8 2
[8,] 9 2
[9,] 10 2
[10,] 13 3
[11,] 14 3
[12,] 15 3
[13,] 16 3
[14,] 17 3
[15,] 18 3
[16,] 19 3
[17,] 20 3
For a data.table
you can try:
foo <- function(x){
gr <- which(!(duplicated(diff(x)) | duplicated(diff(x), fromLast = T)))
if(length(gr) == 1){
rep(1:(length(gr)+1), c(min(gr),length(x)-max(gr)))
}else{
rep(1:(length(gr)+1), c(min(gr), diff(gr),length(x)-max(gr)))
}
}
DT[, group := foo(Var1)]
Upvotes: 1
Reputation: 132706
DT[, group := rleid(cumprod(c(1, diff(Var1))))]
# Var1 group
#1: 1 1
#2: 2 1
#3: 3 1
#4: 4 1
#5: 5 1
#6: 7 2
#7: 8 2
#8: 9 2
#9: 10 2
step <- 2
DT <- data.table(Var1 = c(seq(1,10, 2), seq(13,30, 2)))
DT[, group := rleid(cumsum(c(FALSE, diff(Var1) != step)))]
# Var1 group
# 1: 1 1
# 2: 3 1
# 3: 5 1
# 4: 7 1
# 5: 9 1
# 6: 13 2
# 7: 15 2
# 8: 17 2
# 9: 19 2
#10: 21 2
#11: 23 2
#12: 25 2
#13: 27 2
#14: 29 2
Upvotes: 3