maxxzi
maxxzi

Reputation: 67

R - Exclude rows with events by date

I have a dataframe with ID's, entry dates, events (X1.0 and X1.1) and the dates corresponding to event times (X2.0 and X2.1). I need to find ID's of people who had a specific event before the entry date.

library(dplyr)

df <- data.frame(ID = as.integer(c(10001, 10002, 10004, 10005, 10006)),
                 entry_date = as.Date(c("2013-12-22", "2013-02-13", "2013-03-13", "2013-01-01", "2013-02-02")), 
                 X1.0 = as.integer(c(150, NA, 120, NA, 150)),
                 X1.1 = as.integer(c(120, 180, NA, 150, 120)),
                 X2.0 = as.Date(c("2017-02-13", NA, "2014-06-04", NA, "1990-03-08")), 
                 X2.1 = as.Date(c("2003-12-22", "2011-02-13", NA, "1998-01-01", "2015-02-01")))

#filtering columns that begin with X1 by event 150

> dplyr::filter_at(df, vars(starts_with("X1")), any_vars(. == 150)) 

     ID entry_date X1.0 X1.1       X2.0       X2.1
1 10001 2013-12-22  150  120 2017-02-13 2003-12-22
2 10005 2013-01-01   NA  150       <NA> 1998-01-01
3 10006 2013-02-02  150  120 1990-03-08 2015-02-01

My problem is filtering the dates conditionally because I'm only interested in filtering the "150" events that have occurred before the entry date, and there are other events present. End result should look like this:

     ID entry_date X1.0 X1.1       X2.0       X2.1
1 10005 2013-01-01   NA  150       <NA> 1998-01-01
2 10006 2013-02-02  150  120 1990-03-08 2015-02-01

Upvotes: 0

Views: 210

Answers (2)

user12256545
user12256545

Reputation: 3002

here is a solution piping to another filter function:


dplyr::filter_at(df, vars(starts_with("X1")), any_vars(. == 150)) %>% 
                                             filter(X2.0 <= entry_date | is.na(X2.0))


#     ID entry_date X1.0 X1.1       X2.0       X2.1
#1 10005 2013-01-01   NA  150       <NA> 1998-01-01
#2 10006 2013-02-02  150  120 1990-03-08 2015-02-01

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

We can get the data in long format, getting event value and event date in separate columns. We can then select only those rows which have event value as 150 and entry_date greater than event date. We join this data back to df to get all the corresponding columns.

library(dplyr)

df %>%
  tidyr::pivot_longer(cols = -c(ID, entry_date), 
                      names_to = c('.value', 'col'), 
                      names_sep = "\\.") %>%
   filter(X1 == 150 & entry_date > X2) %>%
   select(ID) %>%
   left_join(df, by = 'ID')


#    ID entry_date  X1.0  X1.1 X2.0       X2.1      
#  <int> <date>     <int> <int> <date>     <date>    
#1 10005 2013-01-01    NA   150 NA         1998-01-01
#2 10006 2013-02-02   150   120 1990-03-08 2015-02-01

Upvotes: 1

Related Questions