user12549787
user12549787

Reputation: 33

Finding the first number after consecutive zeros in data frame

I have a following data frame

data <- tibble(ID=rep(c(1:2), each= 9), time = rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))

I want to retrieve the first row for each subject which has "1" after the consecutive zeros i.e.row number 8 in the data.frame for first subject and row number 15 in the data.frame for the second subject

Upvotes: 3

Views: 529

Answers (6)

G. Grothendieck
G. Grothendieck

Reputation: 270045

1) oneAfter0 takes a vector of 0's and 1's and pastes them together. It then uses regexpr to find the first occurrence of 01 and returns a logical vector tthe same length as the input. That result is TRUE for the position of the first 1 and FALSE elsewhere.

ave is used to apply that to each group and subset is used to subset out the rows corresponding to TRUE.

No packages are used.

oneAfter0 <- function(x) regexpr("01", paste(x, collapse = "")) + 1 == seq_along(x)
subset(data, ave(event, ID, FUN = oneAfter0) == 1)

2) This could alternately be written using dplyr like this:

library(dplyr)

data %>%
  group_by(ID) %>%
  filter(regexpr("01", paste(event, collapse = "")) + 1 == 1:n()) %>%
  ungroup

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

Here is a base R solution with rle():

r <- rle(data$event)
df <- data[cumsum(r$lengths)[r$lengths > 1 & r$values==0]+1,]

such that

> df
   ID time event
8   1    8     1
15  2    6     1

Upvotes: 1

Richard Careaga
Richard Careaga

Reputation: 670

This is a purposely didactic version of Ronak Shah's1 answer, to show inelegantly but stepwise how to use the run lengths from rle to capture row indices to use for identifying runs of zeros and the following non-zero events

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data <- tibble(ID=rep(c(1:2), each= 9), time = rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))
runs <- rle(data$event)
runs <- tibble(runs$lengths, runs$values)
colnames(runs) <- c("lengths", "values")
sequences <- sequences <- tibble(lengths = runs$lengths, values = runs$values) %>% mutate(indices = cumsum(runs$lengths))
post_zero <- sequences %>%  filter(values == 0)
result <- left_join(sequences, post_zero, by = "indices") %>% select(1:3) %>% filter(values.x == 1)
colnames(result) <- c("lengths", "runs", "indices")
data[result$indices,]
#> # A tibble: 4 x 3
#>      ID  time event
#>   <int> <int> <dbl>
#> 1     1     3     1
#> 2     2     2     1
#> 3     2     6     1
#> 4     2     8     1

Created on 2019-12-16 by the reprex package (v0.3.0)

Upvotes: 0

Melissa Key
Melissa Key

Reputation: 4551

My answer is very similar to Eric's, but requires 2 zeros instead of 1.

-- edited to limit the results to only the first occurrence instead of all.

library(dplyr)

data <- tibble(ID=rep(c(1:2), each= 9), time = rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))

data %>%
  group_by(ID) %>%
  filter(
    event == 1,
    dplyr::lag(event) == 0,
    dplyr::lag(event, 2) == 0,
    cumsum(event == 1 &          # this limits the results to the first occurrence
        dplyr::lag(event, default = 1) == 0 &
        dplyr::lag(event, default = 1, n = 2) == 0) == 1
  )

Upvotes: 1

Eric
Eric

Reputation: 1389

a tidyverse answer, if I understand your question correctly:

library(dplyr)
data %>% 
  filter(event==1,lag(event)==0)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389215

We can use rle to select the first row after first consecutive zeroes in each group (ID).

library(dplyr)

data %>%
 group_by(ID) %>%
 slice(with(rle(event == 0), sum(lengths[1:which.max(values)])) + 1)

#     ID  time event
#  <int> <int> <dbl>
#1     1     8     1
#2     2     6     1

Upvotes: 3

Related Questions