Reputation: 81

R: How to identify the first occurrence of a specific value of a variable grouped by ID

I have a dataset in form of a Diary - i.e. i have multiple entries for the same ID. Apart from that, I have a categorical variable (Yes/no) that indicates whether the event occured or not.

ID <-  c(1,1,1,2,2,2,2,3,3,3,3,3,3)
event <- c("No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes") 
df <- data.frame(ID, event)

ID   event   
 1    No
 1    No
 1    No
 2    Yes
 2    No
 2    No
 2    Yes
 3    Yes
 3    Yes
 3    No
 3    No
 3    Yes
 3    Yes

I now want to delete those entries until the first "No", so every ID should start with a "No". However, after the first "No" there can still be a "Yes". So the desired output i want is:

ID   event   
 1    No
 1    No
 1    No
 2    No
 2    No
 2    Yes
 3    No
 3    No
 3    Yes
 3    Yes

Does anybody know how to achieve this? Thanks in advance for your time!

Upvotes: 3

Answers (4)

GKi

Reputation: 39657

In base you can use match to find the position of the first "No". To make this per ID you can use split and lapply. To put the result back in a data.frame you can use rbind with do.call.

do.call(rbind, lapply(split(df, df$ID), function(x) {
  x[match("No", x$event):nrow(x),]}))
#     ID event
#1.1   1    No
#1.2   1    No
#1.3   1    No
#2.5   2    No
#2.6   2    No
#2.7   2   Yes
#3.10  3    No
#3.11  3    No
#3.12  3   Yes
#3.13  3   Yes

Upvotes: 2

chinsoon12

Reputation: 25225

An option using data.table:

library(data.table)
setDT(df)[, .(event=event[match("No", event):.N]), ID]

Upvotes: 2

arg0naut91

Reputation: 14764

Try:

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(cumsum(event == 'No') >= 1)

Output:

# A tibble: 10 x 2
# Groups:   ID [3]
      ID event
   <int> <fct>
 1     1 No   
 2     1 No   
 3     1 No   
 4     2 No   
 5     2 No   
 6     2 Yes  
 7     3 No   
 8     3 No   
 9     3 Yes  
10     3 Yes

Upvotes: 2

Ronak Shah

Reputation: 388907

We can get the first "No" using which.max and select all the rows from there till last row.

library(dplyr)
df %>% group_by(ID) %>% slice(which.max(event == 'No') : n())
#Also
#df %>% group_by(ID) %>% slice(which(event == 'No')[1] : n())

#      ID event
#   <dbl> <chr>
# 1     1 No   
# 2     1 No   
# 3     1 No   
# 4     2 No   
# 5     2 No   
# 6     2 Yes  
# 7     3 No   
# 8     3 No   
# 9     3 Yes  
#10     3 Yes

Upvotes: 2

R: How to identify the first occurrence of a specific value of a variable grouped by ID

Answers (4)

Related Questions