Reputation: 1179
I'm trying to find the most efficient way to solve the following. Suppose we have some data that look like this:
d1 <- seq(0, 3000, length.out = 1000)
d2 <- cos(seq(0, 6*pi, length.out = 1000))*rev(seq(0, 1, length.out = 1000))
dd <- as.data.frame(cbind(d1, d2))
I have the need to detect from d2
the first element of the first sequence of length 20 of consecutively increasing numbers. In the figure above it would be somewhere around d1 = 500
. My current approach is based on this function:
getFirstBeforeSequence <- function(x, y, len){
a1 <- cbind(lapply(split(y, cumsum(c(1, diff(y) < 0))), length))
a2 <- which(a1 > len)[1]-1
a3 <- sum(unlist(a1)[1:a2])+1
a3
}
This function gives me the desired output, the element is in position 164 and has occurred when d1 = 489.4895
:
getFirstBeforeSequence(dd$d1, dd$d2, 20)
# 164
dd$d1[164]
# 489.4895
However, I have the impression that my solution is overly complex and I'm pretty sure that others will have better solutions. Any help will be very appreciated.
Upvotes: 2
Views: 701
Reputation: 8753
This is slower, but gives a completely different approach:
firstOfSequence <- function(x, len){
v <- paste0(sign(diff(x))+1L, collapse="")
regexpr(paste0("([2])\\1{", len-1L, "}"), v)
}
> microbenchmark(
+ firstOfSequence(dd$d2, 20),
+ getFirstBefore(dd$d2, 20))
Unit: microseconds
expr min lq median uq max neval
firstOfSequence(dd$d1, 20) 978.181 981.3875 982.9910 998.7060 1111.597 100
getFirstBefore(dd$d1, 20) 191.147 196.5990 200.4475 205.0975 333.865 100
Upvotes: 1
Reputation: 67778
y <- dd$d1
# indices of pits and peaks
pit <- which(diff(sign(diff(y))) == 2) + 1
peak <- which(diff(sign(diff(y))) == -2) + 1
# distance between peak and pit -> length of increase
len_incr <- peak - pit
# index of first pit from which a consecutive increase in 20 'steps' starts
idx <- pit[(len_incr > 20) == TRUE][1]
# corresponding x-value
dd$d2[idx]
# [1] 489.4895
# similar approach but let 'turnpoint' find pits and peaks.
library(pastecs)
tp <- turnpoints(y)
pit <- which(tp$pits == TRUE)
peak <- which(tp$peaks == TRUE)
len_incr <- peak - pit
idx <- pit[(len_incr > 20) == TRUE][1]
dd$d2[idx]
# [1] 489.4895
Upvotes: 1
Reputation: 15163
Here is a stab:
getFirstBefore<-function(x,len){
r<-rle(sign(diff(x)))
n<-which(r$lengths>=len & r$values==1)
if(length(n)==0)
return(-1)
1+sum(r$lengths[seq_len(n[1]-1)])
}
It's more efficient than the original, but there's probably still room for improvement:
microbenchmark(
getFirstBeforeSequence(dd$d1,dd$d2,20),
getFirstBefore(dd$d2,20))
# Unit: microseconds
# expr min lq median uq
# getFirstBeforeSequence(dd$d1, dd$d2, 20) 2433.174 2464.457 2486.186 2502.2005
# getFirstBefore(dd$d2, 20) 181.354 187.081 192.808 196.6805
# max neval
# 9932.534 100
# 239.700 100
Upvotes: 2