Robert Curtis
Robert Curtis

Reputation: 21

Extracting elements from a vector while skipping a fixed number of elements

Say I have this vector:

a <- round(runif(100, 0, 1), digits = 0)

I want to find the first element in the vector that contains the number 1. After that element is found, skip 3 elements (even if they include 1s), then find the next element that contains 1 and repeat finding 1s and skipping 3 elements after finding 1s.

My desired output is the row numbers for the first element that contains 1, followed by the rest of the row numbers that contain 1, after accounting for the skipped elements.

Upvotes: 2

Views: 880

Answers (3)

jrlewi
jrlewi

Reputation: 506

Perhaps a while loop?

set.seed(123)
a <- round(runif(100,0,1), digits =0)
n <- length(a)
ind_less_n <- 1
i <- 1
index_save <- numeric(n)
while(ind_less_n){
  if(a[i] == 1){
    index_save[i] <- 1
    i <- i + 4
  } else {
    i <- i + 1
  }
  if(i > n) ind_less_n <- 0
}
head(cbind(a, index_save), 20)
      a index_save
 [1,] 0          0
 [2,] 1          1
 [3,] 0          0
 [4,] 1          0
 [5,] 1          0
 [6,] 0          0
 [7,] 1          1
 [8,] 1          0
 [9,] 1          0
[10,] 0          0
[11,] 1          1
[12,] 0          0
[13,] 1          0
[14,] 1          0
[15,] 0          0
[16,] 1          1
[17,] 0          0
[18,] 0          0
[19,] 0          0
[20,] 1          1

You can extract the row numbers with which(index_save == 1)

Upvotes: 0

alistaire
alistaire

Reputation: 43334

You could use Reduce with accumulate = TRUE or purrr::accumulate, though you'll need to iterate over a list with separate elements for the result and the skip count, e.g.

library(tidyverse)
set.seed(47)

df_ones <- data_frame(a = rbinom(100, 1, .5),    # make sample data
                      is_one = a,    # initialize result and count
                      count = NA) %>% 
    split(seq(nrow(.))) %>%    # split into list of one-row data frames
    accumulate(    # for each sequential pair of elements, return and pass on a list of...
        ~list(a = .y$a,    # the original value for checking,
              is_one = if(.x$count <= 3) 0 else .y$is_one,    # the result, changing 1 to 0 where required, and 
              # the count since a 1, resetting when a 1 is kept
              count = if(.x$count > 3 & .y$is_one) {
                  1
              } else {
                  .x$count + 1
              }),
        .init = list(a = NA, is_one = 0, count = 4)    # set initial .x value
    ) %>% 
    bind_rows() %>%    # collapse resulting list to data frame
    slice(-1)    # remove row for initializing value

df_ones
#> # A tibble: 100 x 3
#>        a is_one count
#>    <int>  <dbl> <dbl>
#>  1     1      1     1
#>  2     0      0     2
#>  3     1      0     3
#>  4     1      0     4
#>  5     1      1     1
#>  6     1      0     2
#>  7     0      0     3
#>  8     0      0     4
#>  9     1      1     1
#> 10     1      0     2
#> # ... with 90 more rows

To extract the indices,

df_ones %>% 
    pull(is_one) %>%    # extract result as vector
    as.logical() %>%    # coerce to Boolean
    which()    # get indices of TRUE values
#>  [1]  1  5  9 14 22 29 35 40 44 48 52 56 61 66 71 75 79 84 88 93 97

Upvotes: 0

dww
dww

Reputation: 31452

I don't think you can do this without resorting to some kind of loop. Here's one way to do it. 1st we get a vector of positions of all the ones. Then we repeatedly find the first element of this vector that is 3 or less from the previous and remove it from the list. Repeat until you've removed all the ones that are too close to their predessesor.

x = which(a==1) 
repeat  {
  to.remove = which(diff(x) <= 3)[1] + 1  
  if (is.na(to.remove)) break
  x = x[-to.remove]
}

If you are dealing with very large vectors, there may be more efficient ways to do this, and perhaps consider RCpp if speed is an issue.

Upvotes: 1

Related Questions