MLE
MLE

Reputation: 1043

find number sequence falls within ONE adjacent number (previous and next) by group

Let T={t|t=1,2,3..T} be the time (sequence order number) For each group, at each t when/if a sequence occurs, we need to make sure the sequence (it is a number,let's assume it is X) is within the set of {K-1,K,K+1}, where K is the previous sequence number at t-1. For example, if the previous sequence number K=4, for the next sequence X, if X fall within [3,4,5]. Then this X meet the requirement. If every X in the sequence meets the requirement, this group meets the require and labeled it TRUE.

I know the for loop can do the trick but I have large observations, it is very slow to do it in a loop. I known the cummax can help find the non-deceasing sequence quickly. I was wondering is there any quick solution like cummax.

seq <- c(1,2,1,2,3,1,2,3,1,2,1,2,2,3,4)
group <- rep(letters[1:3],each=5)
dt <- data.frame(group,seq)

> dt
  group seq
1      a   1
2      a   2
3      a   1
4      a   2
5      a   3
6      b   1
7      b   2
8      b   3
9      b   1
10     b   2
11     c   1
12     c   2
13     c   2
14     c   3
15     c   4

The desired output:

y label
a:true
b:false
c:true

Upvotes: 0

Views: 110

Answers (4)

akrun
akrun

Reputation: 886948

We can also use aggregate from base R

aggregate(seq~group,dt,  FUN = function(x) all(c(TRUE, 
                            abs((x[-1] - x[-length(x)])) <=1)))
#  group   seq
#1     a  TRUE
#2     b FALSE
#3     c  TRUE

Upvotes: 1

lmo
lmo

Reputation: 38500

Here is a base R example with aggregate and diff

    aggregate(c(1, abs(diff(dt$seq)) * (tail(dt$group, -1) ==
                                        head(dt$group, -1))),
              dt["group"], function(i) max(i) < 2)

  group     x
1     a  TRUE
2     b FALSE
3     c  TRUE

The first argument to aggregate is a vector that uses diff and turns the result on and off (to zero) based on whether the current adjacent vector elements are in the same group.

Upvotes: 1

Ernest A
Ernest A

Reputation: 7839

You can do:

is.sequence <- function(x)
    all(apply(head(cbind(x-1, x, x+1), -1) - x[-1] == 0, 1, any))

tapply(dt$seq, dt$group, is.sequence)
#    a     b     c 
# TRUE FALSE  TRUE 

Upvotes: 2

akuiper
akuiper

Reputation: 214927

You can use the diff function to check if the adjacent sequence satisfies the condition:

library(dplyr)
dt %>% group_by(group) %>% summarize(label = all(abs(diff(seq)) <= 1))

# A tibble: 3 x 2
#   group label
#  <fctr> <lgl>
#1      a  TRUE
#2      b FALSE
#3      c  TRUE

Here is the corresponding data.table version:

library(data.table)
setDT(dt)[, .(label = all(abs(diff(seq)) <= 1)), .(group)]

Upvotes: 3

Related Questions