Arne Brandschwede
Arne Brandschwede

Reputation: 85

Extract increasing and decreasing sequences from vector

I have a data frame with 2718 observations of which one column is of interest. It is a first difference series created with diff(). For ease, let's create a fake vector that resembles the data and pretend v is a first difference series. The NAs are introduced to make it similar to the original data.

# Create fake first difference series vector v
v <- runif(2718, -0.05, 0.05)
v <- append(NA, diff(v))

# Insert NAs at the beginning and end
v[c(1:8, 2712:2718)] <- NA

# Insert some NAs at random places in v
ind <- which(v %in% sample(v, 7))
v[ind] <- NA

I am interested in the sequences of v that show an increasing and decreasing behaviour. Specifically, I would like to extract the sequences of v that consecutively increase and decrease, respectively. In an increasing sequence, each element of v cannot be less than its preceding element and in a decreasing sequence, each element of v cannot be greater than its preceding element. Try to picture this when plotting v: Whenever the curve does not decrease (i.e. goes up or stays level), it is an increasing sequence and whenever the curve does not increase (i.e. goes down or stays level), it is a decreasing sequence.

To clarify, the procedure may be explained by this:

As v is a first difference series, the extracted element i (3rd bullet point) already represents the increase/decrease. For now, I do not want to limit the length of the sequences, hence a sequence might already be given by two elements.

I imagine storing the ith elements of v in a new vector (e.g. inc.v and dec.v) and afterwards finding maximum and mean increase/decrease of the sequences, as well as maximum and mean length of these sequences. The elements should be stored in inc.v or dec.v with respect to their original positions in v, so I can track them back. Each sequence in inc.v and dec.v should be easy to distinguish when they are separated by NA elements.

I tried writing this with a for loop and conditional statements but did not get far:

inc.v <- NULL
dec.v <- NULL
for (i in 2:length(v)) {
  if(!v[i] < v[i-1] | is.na(v[i])) {
    inc.v[i] <- v[i]
  } else if (!v[i] > v[i-1] | is.na(v[i])) {
    dec.v[i] <- v[i]
  } else {
    next
  }
}

The if and else if statements represent the fifth bullet point. I am aware of the problem that when i equals i-1, it can qualify as both an increasing and decreasing sequence and it should be added to whatever sequence was stored previously. I just have no idea how to implement that. I think the sequences will be quite short as the data is noisy and periods of no decrease/no increase won't prevail for long. Hence, it might be a good idea to also try this operation with e.g. a 50/100 points moving mean:

# A symmetric 50 points moving average for v
f50 <- rep(1/51,51)
v_smooth <- filter(v, f50, sides = 2)

When running the loop as of now, the evaluation of the first condition results in an NA, giving me the error:

Error in if (!v[i] < v[i - 1] | is.na(v[i])) { : 
  missing value where TRUE/FALSE needed

I do not quite understand what's happening here because the is.na() statement should secure a TRUE or FALSE argument?!

Happy to hear your thoughts!

Upvotes: 1

Views: 892

Answers (3)

denis
denis

Reputation: 5673

You should vectorize instead of looping, and use direct conditions on your difference vector to create new column that contain your inc and dec. It works the same when you want to smooth. Here is an example:

library(data.table)
plouf <- setDT(list( v = v, diff = c(NA,diff(v))))
plouf[diff > 0,inc := v]
plouf[diff < 0, dec := v]

f50 <- rep(1/51,51)
plouf[,v_smooth := filter(v, f50, sides = 2)]
plouf[,diff_smooth :=c(NA,diff(v_smooth))]

plouf[diff_smooth > 0,inc_smooth := v_smooth]
plouf[diff_smooth < 0, dec_smooth := v_smooth]

To extract the decrease value, you need to create a grouping variable, that increase at each change of the diff, so we can perform whatever operation on each increasing or decreasing sequence using by

plouf[,grouptmp := abs(c(NA,diff(ifelse(diff>0,1,0))))]
plouf[is.na(grouptmp),grouptmp:= 0]
plouf[,group := cumsum(grouptmp)]

plouf[,decvalue := dec[.N] - dec[1], by = group]
plouf[,incvalue := inc[.N]-inc[1], by = group]

                  v          diff           inc           dec group     decvalue grouptmp
   1:            NA            NA            NA            NA     0           NA        0
   2:            NA            NA            NA            NA     0           NA        0
   3:            NA            NA            NA            NA     0           NA        0
   4:            NA            NA            NA            NA     0           NA        0
   5:            NA            NA            NA            NA     0           NA        0
   6:            NA            NA            NA            NA     0           NA        0
   7:            NA            NA            NA            NA     0           NA        0
   8:            NA            NA            NA            NA     0           NA        0
   9: -0.0344851657            NA            NA            NA     0           NA        0
  10:  0.0788633499  0.1133485156  0.0788633499            NA     0           NA        0
  11: -0.0415118591 -0.1203752090            NA -0.0415118591     1  0.000000000        1
  12:  0.0557818390  0.0972936981  0.0557818390            NA     2           NA        1
  13: -0.0314433977 -0.0872252367            NA -0.0314433977     3  0.000000000        1
  14:  0.0098391432  0.0412825409  0.0098391432            NA     4           NA        1
  15: -0.0147885296 -0.0246276728            NA -0.0147885296     5  0.000000000        1
  16: -0.0009157661  0.0138727635 -0.0009157661            NA     6           NA        1
  17:  0.0303060166  0.0312217827  0.0303060166            NA     6           NA        0
  18: -0.0384165912 -0.0687226078            NA -0.0384165912     7 -0.005185349        1
  19: -0.0436019399 -0.0051853487            NA -0.0436019399     7 -0.005185349        0
  20:  0.0821260908  0.1257280307  0.0821260908            NA     8           NA        1
  21: -0.0172987636 -0.0994248545            NA -0.0172987636     9 -0.003255037        1
  22: -0.0205538005 -0.0032550369            NA -0.0205538005     9 -0.003255037        0
  23: -0.0114417208  0.0091120797 -0.0114417208            NA    10           NA        1
  24:  0.0524503477  0.0638920686  0.0524503477            NA    10           NA        0
  25: -0.0105871856 -0.0630375333            NA -0.0105871856    11 -0.047042624        1
  26: -0.0576298093 -0.0470426237            NA -0.0576298093    11 -0.047042624        0
  27:  0.0031608195  0.0607906288  0.0031608195            NA    12           NA        1
  28: -0.0009828784 -0.0041436979            NA -0.0009828784    13  0.000000000        1
  29:  0.0167153471  0.0176982255  0.0167153471            NA    14           NA        1
  30:  0.0088964230 -0.0078189241            NA  0.0088964230    15 -0.033234568        1
  31:  0.0065035882 -0.0023928348            NA  0.0065035882    15 -0.033234568        0
  32: -0.0243381450 -0.0308417332            NA -0.0243381450    15 -0.033234568        0

You can then easily find the greatest or do whatever you want.

Upvotes: 2

Kevin Cazelles
Kevin Cazelles

Reputation: 1255

Here is an attempt to answer your question (note that I slightly changed your example)

# Create fake first difference series vector v
v <- runif(2718, -0.05, 0.05)
v <- append(NA, diff(v))

# Insert NAs at the beginning and end
v[c(1:8, 2712:2718)] <- NA

# Insert some NAs at random places in v
v[sample(1:length(v), 7)] <- NA

# a couple of equal values
v[10:15] <- 1


# create an empty vector of character
out <- character(length(v)-1)
tmp <- diff(v)
# known increase
out[tmp>0] <- "I"
# known decrease
out[tmp<0] <- "D"
# no change
out[tmp == 0] <- "E"
# known NA
out[is.na(tmp)] <- NA
# let change E for the right value (I or D) if no way to know, I by default
for (i in 1:length(out)) {
  if (!is.na(out[i]) & out[i] == "E") {
    if (i==1) {
      out[i] <- "I"
    } else {
      if (is.na(out[i-1])) {
        out[i] <- "I"
      } else out[i] <- out[i-1]
    }
  }
}

# Retrieve values 
dec.v <- inc.v <- rep(NA_real_, length(v))
idi <- which(out == "I")+1
inc.v[idi] <- v[idi]
idd <- which(out == "I")+1
dec.v[idd] <- v[idd]

Also regarding the error in your loop, you have to change the order of the elements in your logical test, i.e is.na() first so that no test is triggered while v[i] is actually NA.

Hope this help :)

Upvotes: 1

Allen Wang
Allen Wang

Reputation: 2502

You should really try for a vectorized approach, this is probably a clearer way to find runs of increasing or decreasing sequences:

library(data.table)
data <- as.data.table(v)
data[, vl := shift(v, 1)]
data[, runs := rleid(vl > v)]

using data.table library

Upvotes: 1

Related Questions