VLC
VLC

Reputation: 1179

Get first element of the first sequence of length N of consecutively increasing numbers in R

I'm trying to find the most efficient way to solve the following. Suppose we have some data that look like this:

d1 <- seq(0, 3000, length.out = 1000)
d2 <- cos(seq(0, 6*pi, length.out = 1000))*rev(seq(0, 1, length.out = 1000))
dd <- as.data.frame(cbind(d1, d2))

enter image description here

I have the need to detect from d2 the first element of the first sequence of length 20 of consecutively increasing numbers. In the figure above it would be somewhere around d1 = 500. My current approach is based on this function:

getFirstBeforeSequence <- function(x, y, len){
a1 <- cbind(lapply(split(y, cumsum(c(1, diff(y) < 0))), length))
a2 <- which(a1 > len)[1]-1
a3 <- sum(unlist(a1)[1:a2])+1
a3  
}

This function gives me the desired output, the element is in position 164 and has occurred when d1 = 489.4895:

getFirstBeforeSequence(dd$d1, dd$d2, 20)
# 164
dd$d1[164]
# 489.4895

However, I have the impression that my solution is overly complex and I'm pretty sure that others will have better solutions. Any help will be very appreciated.

Upvotes: 2

Views: 701

Answers (3)

Michele
Michele

Reputation: 8753

This is slower, but gives a completely different approach:

firstOfSequence <- function(x, len){

  v <- paste0(sign(diff(x))+1L, collapse="")
  regexpr(paste0("([2])\\1{", len-1L, "}"), v)

}


> microbenchmark(
+   firstOfSequence(dd$d2, 20),
+   getFirstBefore(dd$d2, 20))
Unit: microseconds
                       expr     min       lq   median       uq      max neval
 firstOfSequence(dd$d1, 20) 978.181 981.3875 982.9910 998.7060 1111.597   100
  getFirstBefore(dd$d1, 20) 191.147 196.5990 200.4475 205.0975  333.865   100

Upvotes: 1

Henrik
Henrik

Reputation: 67778

y <- dd$d1

# indices of pits and peaks
pit <- which(diff(sign(diff(y))) == 2) + 1
peak <- which(diff(sign(diff(y))) == -2) + 1

# distance between peak and pit -> length of increase
len_incr <- peak - pit

# index of first pit from which a consecutive increase in 20 'steps' starts
idx <- pit[(len_incr > 20) == TRUE][1]

# corresponding x-value
dd$d2[idx]
# [1] 489.4895    


# similar approach but let 'turnpoint' find pits and peaks.
library(pastecs)
tp <- turnpoints(y)
pit <- which(tp$pits == TRUE)
peak <- which(tp$peaks == TRUE)
len_incr <- peak - pit
idx <- pit[(len_incr > 20) == TRUE][1]
dd$d2[idx]
# [1] 489.4895

Upvotes: 1

mrip
mrip

Reputation: 15163

Here is a stab:

getFirstBefore<-function(x,len){
  r<-rle(sign(diff(x)))
  n<-which(r$lengths>=len & r$values==1)
  if(length(n)==0)
    return(-1)
  1+sum(r$lengths[seq_len(n[1]-1)])
}

It's more efficient than the original, but there's probably still room for improvement:

microbenchmark(
  getFirstBeforeSequence(dd$d1,dd$d2,20),
  getFirstBefore(dd$d2,20))

# Unit: microseconds
#                                      expr      min       lq   median        uq
#  getFirstBeforeSequence(dd$d1, dd$d2, 20) 2433.174 2464.457 2486.186 2502.2005
#                 getFirstBefore(dd$d2, 20)  181.354  187.081  192.808  196.6805
#       max neval
#  9932.534   100
#   239.700   100

Upvotes: 2

Related Questions