Reputation: 33
I have a following data frame
data <- tibble(ID=rep(c(1:2), each= 9), time = rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))
I want to retrieve the first row for each subject which has "1" after the consecutive zeros i.e.row number 8 in the data.frame for first subject and row number 15 in the data.frame for the second subject
Upvotes: 3
Views: 529
Reputation: 270045
1) oneAfter0
takes a vector of 0's and 1's and pastes them together. It then uses regexpr
to find the first occurrence of 01
and returns a logical vector tthe same length as the input. That result is TRUE for the position of the first 1 and FALSE elsewhere.
ave
is used to apply that to each group and subset
is used to subset out the rows corresponding to TRUE.
No packages are used.
oneAfter0 <- function(x) regexpr("01", paste(x, collapse = "")) + 1 == seq_along(x)
subset(data, ave(event, ID, FUN = oneAfter0) == 1)
2) This could alternately be written using dplyr like this:
library(dplyr)
data %>%
group_by(ID) %>%
filter(regexpr("01", paste(event, collapse = "")) + 1 == 1:n()) %>%
ungroup
Upvotes: 3
Reputation: 102529
Here is a base R
solution with rle()
:
r <- rle(data$event)
df <- data[cumsum(r$lengths)[r$lengths > 1 & r$values==0]+1,]
such that
> df
ID time event
8 1 8 1
15 2 6 1
Upvotes: 1
Reputation: 670
This is a purposely didactic version of Ronak Shah's1 answer, to show inelegantly but stepwise how to use the run lengths from rle
to capture row indices to use for identifying runs of zeros and the following non-zero event
s
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data <- tibble(ID=rep(c(1:2), each= 9), time = rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))
runs <- rle(data$event)
runs <- tibble(runs$lengths, runs$values)
colnames(runs) <- c("lengths", "values")
sequences <- sequences <- tibble(lengths = runs$lengths, values = runs$values) %>% mutate(indices = cumsum(runs$lengths))
post_zero <- sequences %>% filter(values == 0)
result <- left_join(sequences, post_zero, by = "indices") %>% select(1:3) %>% filter(values.x == 1)
colnames(result) <- c("lengths", "runs", "indices")
data[result$indices,]
#> # A tibble: 4 x 3
#> ID time event
#> <int> <int> <dbl>
#> 1 1 3 1
#> 2 2 2 1
#> 3 2 6 1
#> 4 2 8 1
Created on 2019-12-16 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 4551
My answer is very similar to Eric's, but requires 2 zeros instead of 1.
-- edited to limit the results to only the first occurrence instead of all.
library(dplyr)
data <- tibble(ID=rep(c(1:2), each= 9), time = rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))
data %>%
group_by(ID) %>%
filter(
event == 1,
dplyr::lag(event) == 0,
dplyr::lag(event, 2) == 0,
cumsum(event == 1 & # this limits the results to the first occurrence
dplyr::lag(event, default = 1) == 0 &
dplyr::lag(event, default = 1, n = 2) == 0) == 1
)
Upvotes: 1
Reputation: 1389
a tidyverse answer, if I understand your question correctly:
library(dplyr)
data %>%
filter(event==1,lag(event)==0)
Upvotes: 1
Reputation: 389215
We can use rle
to select the first row after first consecutive zeroes in each group (ID
).
library(dplyr)
data %>%
group_by(ID) %>%
slice(with(rle(event == 0), sum(lengths[1:which.max(values)])) + 1)
# ID time event
# <int> <int> <dbl>
#1 1 8 1
#2 2 6 1
Upvotes: 3