Reputation: 67
I have a dataframe with ID's, entry dates, events (X1.0 and X1.1) and the dates corresponding to event times (X2.0 and X2.1). I need to find ID's of people who had a specific event before the entry date.
library(dplyr)
df <- data.frame(ID = as.integer(c(10001, 10002, 10004, 10005, 10006)),
entry_date = as.Date(c("2013-12-22", "2013-02-13", "2013-03-13", "2013-01-01", "2013-02-02")),
X1.0 = as.integer(c(150, NA, 120, NA, 150)),
X1.1 = as.integer(c(120, 180, NA, 150, 120)),
X2.0 = as.Date(c("2017-02-13", NA, "2014-06-04", NA, "1990-03-08")),
X2.1 = as.Date(c("2003-12-22", "2011-02-13", NA, "1998-01-01", "2015-02-01")))
#filtering columns that begin with X1 by event 150
> dplyr::filter_at(df, vars(starts_with("X1")), any_vars(. == 150))
ID entry_date X1.0 X1.1 X2.0 X2.1
1 10001 2013-12-22 150 120 2017-02-13 2003-12-22
2 10005 2013-01-01 NA 150 <NA> 1998-01-01
3 10006 2013-02-02 150 120 1990-03-08 2015-02-01
My problem is filtering the dates conditionally because I'm only interested in filtering the "150" events that have occurred before the entry date, and there are other events present. End result should look like this:
ID entry_date X1.0 X1.1 X2.0 X2.1
1 10005 2013-01-01 NA 150 <NA> 1998-01-01
2 10006 2013-02-02 150 120 1990-03-08 2015-02-01
Upvotes: 0
Views: 210
Reputation: 3002
here is a solution piping to another filter function:
dplyr::filter_at(df, vars(starts_with("X1")), any_vars(. == 150)) %>%
filter(X2.0 <= entry_date | is.na(X2.0))
# ID entry_date X1.0 X1.1 X2.0 X2.1
#1 10005 2013-01-01 NA 150 <NA> 1998-01-01
#2 10006 2013-02-02 150 120 1990-03-08 2015-02-01
Upvotes: 0
Reputation: 388982
We can get the data in long format, getting event value and event date in separate columns. We can then select only those rows which have event value as 150 and entry_date
greater than event date. We join this data back to df
to get all the corresponding columns.
library(dplyr)
df %>%
tidyr::pivot_longer(cols = -c(ID, entry_date),
names_to = c('.value', 'col'),
names_sep = "\\.") %>%
filter(X1 == 150 & entry_date > X2) %>%
select(ID) %>%
left_join(df, by = 'ID')
# ID entry_date X1.0 X1.1 X2.0 X2.1
# <int> <date> <int> <int> <date> <date>
#1 10005 2013-01-01 NA 150 NA 1998-01-01
#2 10006 2013-02-02 150 120 1990-03-08 2015-02-01
Upvotes: 1