alison123456
alison123456

Reputation: 11

R filter per group for first and last timepoints only

I have a dataset with column names ID, date, coverage. Each ID has a different number of dates. Coverage is an integer 0-3. I want to filter this dataset so that coverage==3 for the earliest and last time point only. Example input:

ID date coverage
001 2012-12-24 2
001 2013-12-04 3
001 2014-12-14 1
001 2015-12-02 3
001 2016-12-02 0
002 2012-01-15 3
002 2013-11-15 1
002 2014-11-15 3
003 2019-01-15 1
003 2020-11-15 1
003 2021-11-15 3

Example output:

ID date coverage
001 2013-12-04 3
001 2014-12-14 1
001 2015-12-02 3
002 2012-01-15 3
002 2013-11-15 1
002 2014-11-15 3
003 2021-11-15 3

Upvotes: 1

Views: 66

Answers (1)

akrun
akrun

Reputation: 886938

We arrange the 'ID', 'date', grouped by 'ID', slice the rows from the first coverage of 3 value to the last. Note that if there are no 3 values in coverage, we may need a condition to either drop the IDs by having an if/else condition with else returning NULL (or if we want the full data rows for those IDs, use row_number())

library(dplyr)
df1 %>%
   arrange(ID, date) %>%
   group_by(ID) %>%
   slice(if(3 %in% coverage) 
      match(3, coverage):last(which(coverage == 3)) else NULL)  %>%
   # if we want to keep the full rows
   #  slice(if(3 %in% coverage) 
   #    match(3, coverage):last(which(coverage == 3)) else row_number())  %>%
   ungroup

-output

# A tibble: 7 × 3
     ID date       coverage
  <int> <date>        <int>
1     1 2013-12-04        3
2     1 2014-12-14        1
3     1 2015-12-02        3
4     2 2012-01-15        3
5     2 2013-11-15        1
6     2 2014-11-15        3
7     3 2021-11-15        3

data

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L), date = structure(c(15698, 16043, 16418, 16771, 17137, 15354, 
16024, 16389, 17911, 18581, 18946), class = "Date"), coverage = c(2L, 
3L, 1L, 3L, 0L, 3L, 1L, 3L, 1L, 1L, 3L)), row.names = c(NA, -11L
), class = "data.frame")

Upvotes: 0

Related Questions