Reputation: 623
In the following dataset, I want to remove all rows starting at the first instance, sorted by Time
and grouped by ID
, that Var
is TRUE. Put differently, I want to subset all rows for each ID
by those which are FALSE up until the first TRUE, sorted by Time
.
ID <- c('A','B','C','A','B','C','A','B','C','A','B','C')
Time <- c(3,3,3,6,6,6,9,9,9,12,12,12)
Var <- c(F,F,F,T,T,F,T,T,F,T,F,T)
data = data.frame(ID, Time, Var)
data
ID Time Var
1 A 3 FALSE
2 B 3 FALSE
3 C 3 FALSE
4 A 6 TRUE
5 B 6 TRUE
6 C 6 FALSE
7 A 9 TRUE
8 B 9 TRUE
9 C 9 FALSE
10 A 12 TRUE
11 B 12 FALSE
12 C 12 TRUE
The desired result for this data frame should be:
ID Time Var
A 3 FALSE
B 3 FALSE
C 3 FALSE
C 6 FALSE
C 9 FALSE
Note that the solution should not only remove rows where Var
== TRUE, but should also remove rows where Var
== FALSE but this follows (in Time
) another instance where Var
== TRUE for that ID
.
I've tried many different things but can't seem to figure this out. Any help is much appreciated!
Upvotes: 1
Views: 759
Reputation: 18622
You can do this with the cumall
verb as well:
library(dplyr)
data %>%
dplyr::group_by(ID) %>%
dplyr::filter(dplyr::cumall(!Var))
ID Time Var
<chr> <dbl> <lgl>
1 A 3 FALSE
2 B 3 FALSE
3 C 3 FALSE
4 C 6 FALSE
5 C 9 FALSE
cumall(!x): all cases until the first TRUE
Upvotes: 0
Reputation: 16277
Here's how to do that with dplyr
using group_by
and cumsum
.
The rationale is that Var is a logical vector where FALSE is equal to 0 and TRUE is equal to 1. cumsum
will remain at 0 until it hits the first TRUE.
library(dplyr)
data%>%
group_by(ID)%>%
filter(cumsum(Var)<1)
ID Time Var
<fctr> <dbl> <lgl>
1 A 3 FALSE
2 B 3 FALSE
3 C 3 FALSE
4 C 6 FALSE
5 C 9 FALSE
Here's the equivalent code with data.table
:
library(data.table)
data[data[, .I[cumsum(Var) <1], by = ID]$V1]
ID Time Var
1: A 3 FALSE
2: B 3 FALSE
3: C 3 FALSE
4: C 6 FALSE
5: C 9 FALSE
Upvotes: 2
Reputation: 1709
This data.table
solution should work.
library(data.table)
> setDT(data)[, .SD[1:(which.max(Var)-1)], by=ID]
ID Time Var
1: A 3 FALSE
2: B 3 FALSE
3: C 3 FALSE
4: C 6 FALSE
5: C 9 FALSE
Given that you want all the values up to the first TRUE value, which.max
is the way to go.
Upvotes: 0