Reputation: 3394
DATA = data.frame(STUDENT = c(1,1,1,2,3,3,3,3),
PASS = c(1, 0, 1, 0, 0, 0, 0, 1),
DATE = c('2019-12-01', '2020-03-11', '2020-06-17', '2020-02-09',
'2020-10-12', '2020-11-01', '2020-12-07', '2020-06-05'),
WANT = c(1,1,1,0,0,0,0,0))
I have DATA which has STUDENT PASS DATE and I wish to create the VARIABLE WANT such that:
For every STUDENT, find the MAXIMUM(DATE)
and then check if that student has any values PASS = 1 at any time between MAXIMUM(DATE)
and when 'DATE' is 6 months before the value of MAXIMUM(DATE)
, if so then WANT = 1 otherwise WANT equals to zero.
I try this with no success
DATA$WANT = with(WANT, ifelse(DATE[PASS == 1] < max(DATE) & DATE[PASS == 1] >= max(DATE) %m-% months(6),1,0)
Upvotes: 0
Views: 86
Reputation: 1965
This checks to see if DATE is less than max(DATE) minus 6 months. If so, then WANT is 0. Otherwise check to see if PASS == 1. If it is, then WANT is 1, otherwise it's 0 grouped by STUDENT.
Then I group by STUDENT to see if, WANT == 1 exists. If so then assign 1 to all rows for each STUDENT.
library(data.table)
library(lubridate)
data = setDT(data)[, DATE := as_date(DATE)][order(STUDENT, DATE)]
data[, WANT := ifelse(DATE < (max(DATE) %m-% months(6)), 0,
ifelse(PASS == 1, 1, 0)), by = STUDENT]
data[, WANT := ifelse(any(WANT == 1), 1, 0), by = STUDENT]
STUDENT PASS DATE WANT
1: 1 1 2019-12-01 1
2: 1 0 2020-03-11 1
3: 1 1 2020-06-17 1
4: 2 0 2020-02-09 0
5: 3 1 2020-06-05 0
6: 3 0 2020-10-12 0
7: 3 0 2020-11-01 0
8: 3 0 2020-12-07 0
Upvotes: 2