s____h
s____h

Reputation: 81

R: How to identify the first occurrence of a specific value of a variable grouped by ID

I have a dataset in form of a Diary - i.e. i have multiple entries for the same ID. Apart from that, I have a categorical variable (Yes/no) that indicates whether the event occured or not.

ID <-  c(1,1,1,2,2,2,2,3,3,3,3,3,3)
event <- c("No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes") 
df <- data.frame(ID, event)

ID   event   
 1    No
 1    No
 1    No
 2    Yes
 2    No
 2    No
 2    Yes
 3    Yes
 3    Yes
 3    No
 3    No
 3    Yes
 3    Yes

I now want to delete those entries until the first "No", so every ID should start with a "No". However, after the first "No" there can still be a "Yes". So the desired output i want is:

ID   event   
 1    No
 1    No
 1    No
 2    No
 2    No
 2    Yes
 3    No
 3    No
 3    Yes
 3    Yes

Does anybody know how to achieve this? Thanks in advance for your time!

Upvotes: 3

Views: 1609

Answers (4)

GKi
GKi

Reputation: 39657

In base you can use match to find the position of the first "No". To make this per ID you can use split and lapply. To put the result back in a data.frame you can use rbind with do.call.

do.call(rbind, lapply(split(df, df$ID), function(x) {
  x[match("No", x$event):nrow(x),]}))
#     ID event
#1.1   1    No
#1.2   1    No
#1.3   1    No
#2.5   2    No
#2.6   2    No
#2.7   2   Yes
#3.10  3    No
#3.11  3    No
#3.12  3   Yes
#3.13  3   Yes

Upvotes: 2

chinsoon12
chinsoon12

Reputation: 25225

An option using data.table:

library(data.table)
setDT(df)[, .(event=event[match("No", event):.N]), ID]

Upvotes: 2

arg0naut91
arg0naut91

Reputation: 14764

Try:

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(cumsum(event == 'No') >= 1)

Output:

# A tibble: 10 x 2
# Groups:   ID [3]
      ID event
   <int> <fct>
 1     1 No   
 2     1 No   
 3     1 No   
 4     2 No   
 5     2 No   
 6     2 Yes  
 7     3 No   
 8     3 No   
 9     3 Yes  
10     3 Yes 

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388907

We can get the first "No" using which.max and select all the rows from there till last row.

library(dplyr)
df %>% group_by(ID) %>% slice(which.max(event == 'No') : n())
#Also
#df %>% group_by(ID) %>% slice(which(event == 'No')[1] : n())

#      ID event
#   <dbl> <chr>
# 1     1 No   
# 2     1 No   
# 3     1 No   
# 4     2 No   
# 5     2 No   
# 6     2 Yes  
# 7     3 No   
# 8     3 No   
# 9     3 Yes  
#10     3 Yes  

Upvotes: 2

Related Questions