Reputation: 31452
I want to find positions in a vector where the value differs by more than some threshold value from an earlier point in the vector. The first change-point should be measured relative to the first value in the vector. Subsequent change-points should be measured relative to the previous change-point.
I can do this using a for
loop, but I wonder if there is a more idiomatic and faster vectorised soultion.
Minimal example:
set.seed(123)
x = cumsum(rnorm(500))
mindiff = 5.0
start = x[1]
changepoints = integer()
for (i in 1:length(x)) {
if (abs(x[i] - start) > mindiff) {
changepoints = c(changepoints, i)
start = x[i]
}
}
plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')
Upvotes: 7
Views: 702
Reputation: 31452
For completeness, using recursion, we can get an answer that uses only R vectorised functions. However, this will not work on large vectors of results. E.g. In the OP example we get an "evaluation nested too deeply" error when length(x) == 1e5
N = length(x)
f.recurs = function(x, mindiff, i=1) {
next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
else NULL
}
f.recurs(x, 5.0)
# [1] 17 25 56 98 108 144 288 297 307 312 403 470 487
Upvotes: 0
Reputation: 4282
Here is a solution just using baseR Reduce
. Using the argument accumulate = TRUE
, reduce returns the result of every call to the function. In our case it will represent start
value of the solution using the for
loop. Once you have this vector, we only need to find the indexes where the value changes:
#Find the changepoints
r <- Reduce(function(a,e) {
if (abs(e - a) > mindiff)
e
else
a
}, x, accumulate =T)
# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1
EDIT: I have updated the answer using @Eric Watt's comment.
Upvotes: 3
Reputation: 32548
Implementing the same code in Rcpp
can help with speed.
library(Rcpp)
cppFunction(
"IntegerVector foo(NumericVector vect, double difference){
int start = 0;
IntegerVector changepoints;
for (int i = 0; i < vect.size(); i++){
if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
changepoints.push_back (i+1);
start = i;
}
}
return(changepoints);
}"
)
foo(vect = x, difference = mindiff)
# [1] 17 25 56 98 108 144 288 297 307 312 403 470 487
identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE
Benchmarking
#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0
library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()
for (i in 1:length(x)) {
if (abs(x[i] - start) > mindiff) {
changepoints = c(changepoints, i)
start = x[i]
}
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318 100 b
# Rcpp 7.907011 11.93539 14.47328 12.16848 12.38791 263.2796 100 a
Upvotes: 3