Rilcon42
Rilcon42

Reputation: 9763

counting lengths between alternating columns

I am trying to figure out how to count the number of rows from when one column says True to when the other column says True. I attempted to use run length encoding but couldnt figure out how to get the alternating values form each column.

set.seed(42)
s<-sample(c(0,1,2,3),500,replace=T)
isOverbought<-s==1
isOverSold<-s==0
head(cbind(isOverbought,isOverSold),20)
res<-rle(isOverSold)
tt<-res[res$values==0] #getting when Oversold is true

 > head(cbind(isOverbought,isOverSold))

[1,]        FALSE      FALSE
[2,]        FALSE      FALSE
[3,]         TRUE      FALSE <-starting condition is overbought
[4,]        FALSE      FALSE
[5,]        FALSE      FALSE
[6,]        FALSE      FALSE
[7,]        FALSE      FALSE
[8,]        FALSE       TRUE <-is oversold. length from overbought to oversold = 5
[9,]        FALSE      FALSE
[10,]        FALSE      FALSE
[11,]         TRUE      FALSE <- is overbought. length from oversold to overbought = 3
[12,]        FALSE      FALSE
[13,]        FALSE      FALSE
[14,]         TRUE      FALSE
[15,]         TRUE      FALSE
[16,]        FALSE      FALSE
[17,]        FALSE      FALSE
[18,]        FALSE       TRUE <-is oversold. length from overbought to oversold = 7
[19,]         TRUE      FALSE <- is overbought. length from oversold to overbought = 1 
[20,]        FALSE      FALSE

GOAL

overboughtTOoversold oversoldTOoverbought
5                     3
7                     1

Upvotes: 3

Views: 111

Answers (4)

hvollmeier
hvollmeier

Reputation: 2986

Here is a short version:

  • create a vector called mktState. Encode it with 1 if overbought is TRUE, -1 if oversold is TRUE and NA if both first 2 cols are FALSE.( You are interested only in days where the market state switches)
  • use na.locf to fill the NAs with the last observation carried forward
  • now use the rle function

    mktState <- ifelse(df$overBought == TRUE,1,ifelse(df$overSold == TRUE,-1,NA)) mktState <- na.locf(mktState)

to get 'overbought' runs:

> rle(mktState)$lengths[rle(mktState)$values == 1]
 [1]  5  7  3  8  2 10  7  3  1  2  4  2  5  6  3 11  4  1  5  2  4  6  1  1  8
[26]  7  3  1  1  1  1  3  2  3  1  6  1  1  1  3  2  4  2  1  6  8  8  1  5 15
[51]  2  5  4  2  1  1  3  4  7  1  7 11  1  3  4  2  4  1

and this will give you the 'oversold' runs:

> rle(mktState)$lengths[rle(mktState)$values == -1]
 [1]  3  1  5  6  2  4  1  4  3  3  3  5  2  4  1 14  2  2 10  3  7  1 13  1  1
[26]  3  3  1  6  5  2  1  8  7  2  3  1  1  3  5  1  1  2  3  1  2  2  3  3  1
[51]  8  9  4  2  1  6  2  1  3  2  4  5  1  3  7  4  2  2

Upvotes: 2

aichao
aichao

Reputation: 7455

The assumption for this answer is that there is at least one overbought/oversold transition (either direction) and hence at least two rows in the data. This condition can easily be checked by counting the number of overbought and oversold conditions and making sure that both are greater than one.

The key is to remove the consecutive overbought and oversold conditions so that we only have alternating overbought and oversold conditions. One way to do this is:

## detect where we are overbought and oversold
i1 <- which(isOverbought)
i2 <- which(isOverSold)
## concatenate into one vector
i3 <- c(i1,i2)
## sort these and get the indices from the sort
i4 <- order(i3)
## at this point consecutive overbought or oversold conditions
## will be marked by a difference of 1 in i4 while alternating
## conditions will be marked by something other than 1. So 
## filter those out to get i6. BTW, consecutive here does not mean
## consecutive rows in the data but consecutive occurrence of 
## either overbought or oversold conditions without an intervening 
## condition of the other. The assumption for at least one transition 
## in the data is needed for this to work.
i5 <- diff(i4)
i6 <- i4[c(1,which(i5 != 1)+1)]
## then recover the alternating rows of overbought and oversold conditions in i7
i7 <- i3[i6]
## take the difference and format the output
## I need to credit @akrun for this part
i8 <- diff(i7)
## need to determine which is first
if (i1[1] < i2[1]) {
  overboughtTOoversold <- i8[c(TRUE, FALSE)]
  oversoldTOoverbought <- i8[c(FALSE, TRUE)]
} else {
  overboughtTOoversold <- i8[c(FALSE, TRUE)]
  oversoldTOoverbought <- i8[c(TRUE, FALSE)]  
}
d1 <- cbind(overboughtTOoversold, oversoldTOoverbought)
print(head(d1))
##     overboughtTOoversold oversoldTOoverbought
##[1,]                    5                    3
##[2,]                    7                    1
##[3,]                    3                    5
##[4,]                    8                    6
##[5,]                    2                    2
##[6,]                   10                    4

The cbind may generate a warning that the columns are not the same length. To get rid of that, just pad with NA at the end as appropriate.

A more compact version of the above is:

i3 <- c(which(isOverbought), which(isOverSold))
i4 <- order(i3)
i8 <- diff(i3[i4[c(1,which(diff(i4) != 1)+1)]])
if (which(isOverbought)[1] < which(isOverSold)[1]) {
  overboughtTOoversold <- i8[c(TRUE, FALSE)]
  oversoldTOoverbought <- i8[c(FALSE, TRUE)]
} else {
  overboughtTOoversold <- i8[c(FALSE, TRUE)]
  oversoldTOoverbought <- i8[c(TRUE, FALSE)]  
}
d1 <- cbind(overboughtTOoversold, oversoldTOoverbought)

Upvotes: 2

Zheyuan Li
Zheyuan Li

Reputation: 73385

This is sufficient to solve your problem.

## `a` to `b`
a2b <- function (a, b) {
  x <- which(a)    ## position of `TRUE` in `a`
  y <- which(b)    ## position of `TRUE` in `b`
  z <- which(a | b)   ## position of all `TRUE`
  end <- match(y, z)    ## match for end position
  start <- c(1L, end[-length(end)] + 1L)    ## start position
  valid <- end > start  ## remove cases with `end = start`
  z[end[valid]] - z[start[valid]]
  }

## cross `a` and `b`
axb <- function (a, b) {
  if (any(a & b))
    stop ("Invalid input! `a` and `b` can't have TRUE at the same time!")
  x <- a2b(a, b); y <- a2b(b, a)
  if (which(a)[1L] < which(b)[1L]) cbind(a2b = x, b2a = c(NA_integer_, y))
  else cbind(a2b = c(NA_integer_, x), b2a = y)
  }

For your isOverbought and isOverSold, we obtain:

result <- axb(isOverbought, isOverSold)

head(result)
#     a2b b2a
#[1,]   5  NA
#[2,]   7   3
#[3,]   3   1
#[4,]   8   5
#[5,]   2   6
#[6,]  10   2

Since isOverbought has the first TRUE before isOverSold, the first element of the 2nd column is NA.

Upvotes: 2

alistaire
alistaire

Reputation: 43354

Here's a [somewhat long] tidyverse version:

library(dplyr)
library(tidyr)

# put vectors in a data.frame
data.frame(isOverbought, isOverSold) %>%
    # evaluate each row separately
    rowwise() %>% 
    # add column with name of event for any TRUE, else NA
    mutate(change_type = ifelse(isOverbought | isOverSold, names(.)[c(isOverbought, isOverSold)], NA)) %>% 
    # reset grouping
    ungroup() %>% 
    # replace NA values with last non-NA value
    fill(change_type) %>% 
    # add a column of the cumulate number of changes in change_type
    mutate(changes = data.table::rleid(change_type)) %>% 
    # count number of rows in each changes and change_type grouping
    count(changes, change_type) %>% 
    # remove leading NAs
    na.omit() %>% 
    # reset grouping
    ungroup() %>% 
    # edit change into runs of two with integer division
    mutate(changes = changes %/% 2) %>% 
    # spread to wide form
    spread(change_type, n) %>% 
    # get rid of extra column
    select(-changes)

## # A tibble: 68 x 2
##    isOverbought isOverSold
## *         <int>      <int>
## 1             5          3
## 2             7          1
## 3             3          5
## 4             8          6
## 5             2          2
## 6            10          4
## 7             7          1
## 8             3          4
## 9             1          3
## 10            2          3
## # ... with 58 more rows

Upvotes: 0

Related Questions