Reputation: 81
I have a dataset in form of a Diary - i.e. i have multiple entries for the same ID. Apart from that, I have a categorical variable (Yes/no) that indicates whether the event occured or not.
ID <- c(1,1,1,2,2,2,2,3,3,3,3,3,3)
event <- c("No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes")
df <- data.frame(ID, event)
ID event
1 No
1 No
1 No
2 Yes
2 No
2 No
2 Yes
3 Yes
3 Yes
3 No
3 No
3 Yes
3 Yes
I now want to delete those entries until the first "No", so every ID should start with a "No". However, after the first "No" there can still be a "Yes". So the desired output i want is:
ID event
1 No
1 No
1 No
2 No
2 No
2 Yes
3 No
3 No
3 Yes
3 Yes
Does anybody know how to achieve this? Thanks in advance for your time!
Upvotes: 3
Views: 1609
Reputation: 39657
In base you can use match
to find the position of the first "No"
. To make this per ID
you can use split
and lapply
. To put the result back in a data.frame
you can use rbind
with do.call
.
do.call(rbind, lapply(split(df, df$ID), function(x) {
x[match("No", x$event):nrow(x),]}))
# ID event
#1.1 1 No
#1.2 1 No
#1.3 1 No
#2.5 2 No
#2.6 2 No
#2.7 2 Yes
#3.10 3 No
#3.11 3 No
#3.12 3 Yes
#3.13 3 Yes
Upvotes: 2
Reputation: 25225
An option using data.table
:
library(data.table)
setDT(df)[, .(event=event[match("No", event):.N]), ID]
Upvotes: 2
Reputation: 14764
Try:
library(dplyr)
df %>%
group_by(ID) %>%
filter(cumsum(event == 'No') >= 1)
Output:
# A tibble: 10 x 2
# Groups: ID [3]
ID event
<int> <fct>
1 1 No
2 1 No
3 1 No
4 2 No
5 2 No
6 2 Yes
7 3 No
8 3 No
9 3 Yes
10 3 Yes
Upvotes: 2
Reputation: 388907
We can get the first "No"
using which.max
and select all the rows from there till last row.
library(dplyr)
df %>% group_by(ID) %>% slice(which.max(event == 'No') : n())
#Also
#df %>% group_by(ID) %>% slice(which(event == 'No')[1] : n())
# ID event
# <dbl> <chr>
# 1 1 No
# 2 1 No
# 3 1 No
# 4 2 No
# 5 2 No
# 6 2 Yes
# 7 3 No
# 8 3 No
# 9 3 Yes
#10 3 Yes
Upvotes: 2