Reputation: 13
I have a dataset that contains multiple observations per individual and observations were taken yearly. There are 2 conditions (TRUE or FALSE). Some individuals have multiple observations per condition, others have multiple observations for only 1 condition:
ID = c(1,1,1,2,2,3,3,3,4,4,4,5,5,5,5)
Cond = c(FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE)
date = as.Date(c("2017-05-02", "2018-07-04", "2019-06-18","2011-05-05", "2012-08-09", "2015-06-13", "2016-02-08", "2017-07-22", "2014-05-03", "2015-09-05", "2016-04-07", "2016-03-06", "2017-07-23", "2018-09-28", "2019-05-29"))
df = data.frame(ID, Cond, date)
Which then looks like this:
ID Cond date
1 1 FALSE 2017-05-02
2 1 TRUE 2018-07-04
3 1 TRUE 2019-06-18
4 2 FALSE 2011-05-05
5 2 FALSE 2012-08-09
6 3 FALSE 2015-06-13
7 3 FALSE 2016-02-08
8 3 TRUE 2017-07-22
9 4 FALSE 2014-05-03
10 4 FALSE 2015-09-05
11 4 FALSE 2016-04-07
12 5 FALSE 2016-03-06
13 5 FALSE 2017-07-23
14 5 TRUE 2018-09-28
15 5 TRUE 2019-05-29
I want to subset the data so that each individual has one row for the last observation where the condition was FALSE and a second row for the first observation where the condition is TRUE, excluding all individuals who only have observations for one condition. The above example would then look like:
ID Cond date
1 1 FALSE 2017-05-02
2 1 TRUE 2018-07-04
3 3 FALSE 2016-02-08
4 3 TRUE 2017-07-22
5 5 FALSE 2017-07-23
6 5 TRUE 2018-09-28
I'm just not sure on how to do this with a large dataset.
Upvotes: 1
Views: 411
Reputation: 30494
I suspect there is likely a better way, but here is one approach using dplyr
. First, group_by
each ID
and then use slice
for obtain the last Cond
which is FALSE
using tail
, and the first Cond
which is TRUE
using head
. Finally, only include results where there are two rows for a given ID
.
library(dplyr)
df %>%
arrange(ID, date) %>%
group_by(ID) %>%
slice(c(tail(which(!Cond), 1), head(which(Cond), 1))) %>%
filter(n() == 2)
Output
ID Cond date
<dbl> <lgl> <date>
1 1 FALSE 2017-05-02
2 1 TRUE 2018-07-04
3 3 FALSE 2016-02-08
4 3 TRUE 2017-07-22
5 5 FALSE 2017-07-23
6 5 TRUE 2018-09-28
Upvotes: 1