dww
dww

Reputation: 31452

find points in vector where change is greater than threshold

I want to find positions in a vector where the value differs by more than some threshold value from an earlier point in the vector. The first change-point should be measured relative to the first value in the vector. Subsequent change-points should be measured relative to the previous change-point.

I can do this using a for loop, but I wonder if there is a more idiomatic and faster vectorised soultion.

Minimal example:

set.seed(123)
x = cumsum(rnorm(500))

mindiff = 5.0
start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
  if (abs(x[i] - start) > mindiff) {
    changepoints = c(changepoints, i)
    start = x[i]
  }
}

plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')

enter image description here

Upvotes: 7

Views: 702

Answers (3)

dww
dww

Reputation: 31452

For completeness, using recursion, we can get an answer that uses only R vectorised functions. However, this will not work on large vectors of results. E.g. In the OP example we get an "evaluation nested too deeply" error when length(x) == 1e5

N = length(x)
f.recurs = function(x, mindiff, i=1) {
  next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
  if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
  else NULL
}

f.recurs(x, 5.0)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

Upvotes: 0

Marcelo
Marcelo

Reputation: 4282

Here is a solution just using baseR Reduce. Using the argument accumulate = TRUE, reduce returns the result of every call to the function. In our case it will represent start value of the solution using the for loop. Once you have this vector, we only need to find the indexes where the value changes:

#Find the changepoints
r <- Reduce(function(a,e) {
  if (abs(e - a) > mindiff)
    e
  else 
    a
  }, x, accumulate =T)

# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1

EDIT: I have updated the answer using @Eric Watt's comment.

Upvotes: 3

d.b
d.b

Reputation: 32548

Implementing the same code in Rcpp can help with speed.

library(Rcpp)
cppFunction(
  "IntegerVector foo(NumericVector vect, double difference){
    int start = 0;
    IntegerVector changepoints;
    for (int i = 0; i < vect.size(); i++){
      if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
        changepoints.push_back (i+1);
        start = i;        
      }
    }
    return(changepoints);
  }"
  )

foo(vect = x, difference = mindiff)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE

Benchmarking

#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0

library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
    if (abs(x[i] - start) > mindiff) {
        changepoints = c(changepoints, i)
        start = x[i]
    }
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
#  expr        min        lq      mean    median        uq      max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318   100   b
#  Rcpp   7.907011  11.93539  14.47328  12.16848  12.38791 263.2796   100  a 

Upvotes: 3

Related Questions