89_Simple
89_Simple

Reputation: 3805

R: find consecutive occurrence of a number

first define some function to bind list rowwise and column wise

# a function to append vectors row wise 
rbindlist <- function(list) {
              n <- length(list)
              res <- NULL
              for (i in seq(n)) res <- rbind(res, list[[i]])
              return(res)
            }

 cbindlist <- function(list) {
              n <- length(list)
              res <- NULL
              for (i in seq(n)) res <- cbind(res, list[[i]])
              return(res)
            }

# generate sample data
        sample.dat <- list()  
        set.seed(123)
        for(i in 1:365){
            vec1 <- sample(c(0,1), replace=TRUE, size=5)
            sample.dat[[i]] <- vec1
         }

        dat <- rbindlist(sample.dat)

dat has five columns. Each column is a location and has 365 days of the year (365 rows) with values 1 or 0. I have another dataframe (see below) which has certain days of the year for each column (location) in dat.

# generate second sample data
      set.seed(123)
      sample.dat1 <- list()  
      for(i in 1:5){
           vec1 <- sort(sample(c(258:365), replace=TRUE, size=4), decreasing = F)
           sample.dat1[[i]] <- vec1
      }

            dat1 <- cbindlist(sample.dat1)

I need to use dat1 to subset days in dat to do a calculation. An example below:

1) For location 1 (first column in both dat1 and dat): In column 1 of dat, select the days from 289 till 302 (using dat1), find the longest consecutive occurrence of 1. Repeat it and this time select the days from 303 (302 + 1) till 343 from dat, find the longest consecutive occurrence of 1. Repeat it for 343 till 353: select the days from 344 (343 + 1) till 353, find the longest consecutive occurrence of 1.

2) Do this for all the columns

If I want to do sum of 1s, I can do this:

    dat <- as.tibble(dat)
    dat1 <- as.tibble(dat1)

    pmap(list(dat,dat1), ~ {
       range1 <- ..2[1]
       range2 <- ..2[2]
       range3 <- ..2[3]
       range4 <- ..2[4]

       sum.range1 <- sum(..1[range1:range2]) # this will generate sum between range 1 and range 2
       sum.range2 <- sum(..1[range2:range3]) # this will generate sum between range 2 and range 3
       sum.range3 <- sum(..1[range3:range4]) # this will generate sum between range 3 and range 4

       c(sum.range1=sum.range1,sum.range2=sum.range2,sum.range3=sum.range3) 

    }) 

For longest consequtive occurrence of 1 between each range, I thought of using the rle function. Example below:

  pmap(list(dat,dat1), ~ {
       range1 <- ..2[1]
       range2 <- ..2[2]
       range3 <- ..2[3]
       range4 <- ..2[4]

spell.range1 <- rle(..1[range1:range2]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range1 <- tapply(spell.range1$lengths, spell.range1$values, max)[2] # this should select the maximum consequtive run of 1 

spell.range2 <- rle(..1[range2:range3]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range2 <- tapply(spell.range2$lengths, spell.range2$values, max)[2] # this should select the maximum consequtive run of 1 

spell.range3 <- rle(..1[range3:range4]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range3 <- tapply(spell.range3$lengths, spell.range3$values, max)[2] # this should select the maximum consequtive run of 1

c(spell.1.range1 = spell.1.range1, spell.1.range2 = spell.1.range2, spell.1.range3 = spell.1.range3) 

 })

I get an error which I think is because I am not using the rle function properly here. I would really like to keep the code as above since my others code are in the same pattern and format of the outputs is suited for my need, so I would appreciate if someone can suggest how to fix it.

Upvotes: 1

Views: 2402

Answers (1)

Uwe
Uwe

Reputation: 42544

OP's code does work for me. So, without a specific error message it is impossible to understand why the code is not working for the OP.

However, the sample datasets created by the OP are matrices (before they were coerced to tibble) and I felt challenged to find a way to solve the task in base R without using purrr:

To find the number of consecutive occurences of a particular value val in a vector x we can use the following function:

max_rle <- function(x, val) {
  y <- rle(x)
  len <- y$lengths[y$value == val]
  if (length(len) > 0) max(len) else NA
}

Examples:

max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 1)
[1] 4
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 0)
[1] 2
# find consecutive occurrences in column batches
lapply(seq_len(ncol(dat1)), function(col_num) {
  start <- head(dat1[, col_num], -1L)
  end   <- tail(dat1[, col_num], -1L) - 1
  sapply(seq_along(start), function(range_num) {
    max_rle(dat[start[range_num]:end[range_num], col_num], 1)
  })
})
[[1]]
[1] 8 4 5

[[2]]
[1] 4 5 2

[[3]]
[1] NA  3  4

[[4]]
[1] 5 5 4

[[5]]
[1] 3 2 3

The first lapply() loops over the columns of dat and dat1, resp. The second sapply() loops over the row ranges stored in dat1 and subsets dat accordingly.

Upvotes: 3

Related Questions