excluding some rows based on condition from a data frame using R

Question

I have some data such as this

structure(list(id = c(1, 1, 2, 3, 4, 4, 5), deathdate = c("2007/04/10", 
"2007/04/10", "2004/04/01", "NA", "NA", "2018/01/01", "2016/01/02"
), admidate = c("2007/03/08", "2007/04/11", "2004/04/15", "2012/10/20", 
"2017/10/14", "2018/01/02", "2015/12/20")), class = "data.frame", row.names = c(NA, 
-7L))

and I want the rows where the death date is less than admidate to be removed from the new df such as this

structure(list(id2 = c(1, 3, 4, 5), deathdate2 = c("2007/04/10", 
"NA", "NA", "2016/01/02"), admidate2 = c("2007/03/08", "2012/10/20", 
"2017/10/14", "2015/12/20")), class = "data.frame", row.names = c(NA, 
-4L))

I tried this

    deathbefore <- with(df,(!is.na(deathdate))& !is.na(admidate)& deathdate < admidate)

df2 <- df[-deathbefore,]

However, it doesn't solve the problem.

Ronak Shah · Accepted Answer

Change the dates to date object and select rows where deathdate > admidate or has a NA value.

library(dplyr)

df %>%
  mutate(across(contains('date'), na_if, "NA"),
        across(contains('date'), lubridate::ymd)) %>%
  filter(deathdate > admidate | is.na(deathdate) | is.na(admidate))

#  id  deathdate   admidate
#1  1 2007-04-10 2007-03-08
#2  3        2012-10-20
#3  4        2017-10-14
#4  5 2016-01-02 2015-12-20

excluding some rows based on condition from a data frame using R

Answers (2)

Related Questions