Bonnie
Bonnie

Reputation: 481

How to check if a vector contains n consecutive numbers

Suppose that my vector numbers contains c(1,2,3,5,7,8), and I wish to find if it contains 3 consecutive numbers, which in this case, are 1,2,3.

numbers = c(1,2,3,5,7,8)
difference = diff(numbers) //The difference output would be 1,1,2,2,1

To verify that there are 3 consecutive integers in my numbers vector, I've tried the following with little reward.

rep(1,2)%in%difference 

The above code works in this case, but if my difference vector = (1,2,2,2,1), it would still return TRUE even though the "1"s are not consecutive.

Upvotes: 31

Views: 30181

Answers (5)

flodel
flodel

Reputation: 89057

Benchmarks!

I am including a couple functions of mine. Feel free to add yours. To qualify, you need to write a general function that tells if a vector x contains n or more consecutive numbers. I provide a unit test function below.


The contenders:

flodel.filter <- function(x, n, incr = 1L) {
  if (n > length(x)) return(FALSE)
  x <- as.integer(x)
  is.cons <- tail(x, -1L) == head(x, -1L) + incr
  any(filter(is.cons, rep(1L, n-1L), sides = 1, method = "convolution") == n-1L,
      na.rm = TRUE)
}

flodel.which <- function(x, n, incr = 1L) {
  is.cons <- tail(x, -1L) == head(x, -1L) + incr
  any(diff(c(0L, which(!is.cons), length(x))) >= n)
}

thelatemail.rle <- function(x, n, incr = 1L) {
  result <- rle(diff(x))
  any(result$lengths >= n-1L  & result$values == incr)
}

improved.rle <- function(x, n, incr = 1L) {
  result <- rle(diff(as.integer(x)) == incr)
  any(result$lengths >= n-1L  & result$values)
}

carl.seqle <- function(x, n, incr = 1) {
  if(!is.numeric(x)) x <- as.numeric(x) 
  z <- length(x)  
  y <- x[-1L] != x[-z] + incr 
  i <- c(which(y | is.na(y)), z) 
  any(diff(c(0L, i)) >= n)
}

Unit tests:

check.fun <- function(fun)
  stopifnot(
    fun(c(1,2,3),   3),
   !fun(c(1,2),     3),
   !fun(c(1),       3),
   !fun(c(1,1,1,1), 3),
   !fun(c(1,1,2,2), 3),
    fun(c(1,1,2,3), 3)
  )

check.fun(flodel.filter)
check.fun(flodel.which)
check.fun(thelatemail.rle)
check.fun(improved.rle)
check.fun(carl.seqle)

Benchmarks:

x <- sample(1:10, 1000000, replace = TRUE)

library(microbenchmark)
microbenchmark(
  flodel.filter(x, 6),
  flodel.which(x, 6),
  thelatemail.rle(x, 6),
  improved.rle(x, 6),
  carl.seqle(x, 6),
  times = 10)

# Unit: milliseconds
#                   expr       min       lq   median       uq      max neval
#    flodel.filter(x, 6)  96.03966 102.1383 144.9404 160.9698 177.7937    10
#     flodel.which(x, 6) 131.69193 137.7081 140.5211 185.3061 189.1644    10
#  thelatemail.rle(x, 6) 347.79586 353.1015 361.5744 378.3878 469.5869    10
#     improved.rle(x, 6) 199.35402 200.7455 205.2737 246.9670 252.4958    10
#       carl.seqle(x, 6) 213.72756 240.6023 245.2652 254.1725 259.2275    10

Upvotes: 17

Nishanth
Nishanth

Reputation: 7130

After diff you can check for any consecutive 1s -

numbers = c(1,2,3,5,7,8)

difference = diff(numbers) == 1
## [1]  TRUE  TRUE FALSE FALSE  TRUE

## find alteast one consecutive TRUE
any(tail(difference, -1) &
    head(difference, -1))

## [1] TRUE

Upvotes: 11

thelatemail
thelatemail

Reputation: 93813

Using diff and rle, something like this should work:

result <- rle(diff(numbers))
any(result$lengths>=2 & result$values==1)
# [1] TRUE

In response to the comments below, my previous answer was specifically only testing for runs of length==3 excluding longer lengths. Changing the == to >= fixes this. It also works for runs involving negative numbers:

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> result <- rle(diff(numbers4))
> any(result$lengths>=2 & result$values==1)
[1] TRUE

Upvotes: 25

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

It's nice to see home-grown solutions here.

Fellow Stack Overflow user Carl Witthoft posted a function he named seqle() and shared it here.

The function looks like this:

seqle <- function(x,incr=1) { 
  if(!is.numeric(x)) x <- as.numeric(x) 
  n <- length(x)  
  y <- x[-1L] != x[-n] + incr 
  i <- c(which(y|is.na(y)),n) 
  list(lengths = diff(c(0L,i)),
       values = x[head(c(0L,i)+1L,-1L)]) 
} 

Let's see it in action. First, some data:

numbers1 <- c(1, 2, 3, 5, 7, 8)
numbers2 <- c(-2, 2, 3, 5, 6, 7, 8)
numbers3 <- c(1, 2, 2, 2, 1, 2, 3)

Now, the output:

seqle(numbers1)
# $lengths
# [1] 3 1 2
# 
# $values
# [1] 1 5 7
# 
seqle(numbers2)
# $lengths
# [1] 1 2 4
# 
# $values
# [1] -2  2  5
# 
seqle(numbers3)
# $lengths
# [1] 2 1 1 3
# 
# $values
# [1] 1 2 2 1
# 

Of particular interest to you is the "lengths" in the result.

Another interesting point is the incr argument. Here we can set the increment to, say, "2" and look for sequences where the difference between the numbers are two. So, for the first vector, we would expect the sequence of 3, 5, and 7 to be detected.

Let's try:

> seqle(numbers1, incr = 2)
$lengths
[1] 1 1 3 1

$values
[1] 1 2 3 8

So, we can see that we have a sequence of 1 (1), 1 (2), 3 (3, 5, 7), and 1 (8) if we set incr = 2.


How does it work with ECII's second challenge? Seems OK!

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> seqle(numbers4)
$lengths
[1] 3 1 2

$values
[1] -2  5  7

Upvotes: 7

ECII
ECII

Reputation: 10629

Simple but works

numbers = c(-2,2,3,4,5,10,6,7,8)
x1<-c(diff(numbers),0)
x2<-c(0,diff(numbers[-1]),0)
x3<-c(0,diff(numbers[c(-1,-2)]),0,0)

rbind(x1,x2,x3)
colSums(rbind(x1,x2,x3) )==3 #Returns TRUE or FALSE where in the vector the consecutive intervals triplet takes place
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

sum(colSums(rbind(x1,x2,x3) )==3) #How many triplets of consecutive intervals occur in the vector
[1] 3

which(colSums(rbind(x1,x2,x3) )==3) #Returns the location of the triplets consecutive integers
[1] 2 3 7

Note that this will not work for consecutive negative intervals c(-2,-1,0) because of how diff() works

Upvotes: 5

Related Questions