Reputation: 1043
Let T={t|t=1,2,3..T}
be the time (sequence order number) For each group, at each t
when/if a sequence occurs, we need to make sure the sequence (it is a number,let's assume it is X
) is within the set of {K-1,K,K+1
}, where K
is the previous sequence number at t-1
. For example, if the previous sequence number K=4, for the next sequence X, if X fall within [3,4,5]. Then this X meet the requirement. If every X in the sequence meets the requirement, this group meets the require and labeled it TRUE.
I know the for loop can do the trick but I have large observations, it is very slow to do it in a loop. I known the cummax
can help find the non-deceasing sequence quickly. I was wondering is there any quick solution like cummax
.
seq <- c(1,2,1,2,3,1,2,3,1,2,1,2,2,3,4)
group <- rep(letters[1:3],each=5)
dt <- data.frame(group,seq)
> dt
group seq
1 a 1
2 a 2
3 a 1
4 a 2
5 a 3
6 b 1
7 b 2
8 b 3
9 b 1
10 b 2
11 c 1
12 c 2
13 c 2
14 c 3
15 c 4
The desired output:
y label
a:true
b:false
c:true
Upvotes: 0
Views: 110
Reputation: 886948
We can also use aggregate
from base R
aggregate(seq~group,dt, FUN = function(x) all(c(TRUE,
abs((x[-1] - x[-length(x)])) <=1)))
# group seq
#1 a TRUE
#2 b FALSE
#3 c TRUE
Upvotes: 1
Reputation: 38500
Here is a base R example with aggregate
and diff
aggregate(c(1, abs(diff(dt$seq)) * (tail(dt$group, -1) ==
head(dt$group, -1))),
dt["group"], function(i) max(i) < 2)
group x
1 a TRUE
2 b FALSE
3 c TRUE
The first argument to aggregate
is a vector that uses diff
and turns the result on and off (to zero) based on whether the current adjacent vector elements are in the same group.
Upvotes: 1
Reputation: 7839
You can do:
is.sequence <- function(x)
all(apply(head(cbind(x-1, x, x+1), -1) - x[-1] == 0, 1, any))
tapply(dt$seq, dt$group, is.sequence)
# a b c
# TRUE FALSE TRUE
Upvotes: 2
Reputation: 214927
You can use the diff
function to check if the adjacent sequence satisfies the condition:
library(dplyr)
dt %>% group_by(group) %>% summarize(label = all(abs(diff(seq)) <= 1))
# A tibble: 3 x 2
# group label
# <fctr> <lgl>
#1 a TRUE
#2 b FALSE
#3 c TRUE
Here is the corresponding data.table
version:
library(data.table)
setDT(dt)[, .(label = all(abs(diff(seq)) <= 1)), .(group)]
Upvotes: 3