kam
kam

Reputation: 345

excluding some rows based on condition from a data frame using R

I have some data such as this

structure(list(id = c(1, 1, 2, 3, 4, 4, 5), deathdate = c("2007/04/10", 
"2007/04/10", "2004/04/01", "NA", "NA", "2018/01/01", "2016/01/02"
), admidate = c("2007/03/08", "2007/04/11", "2004/04/15", "2012/10/20", 
"2017/10/14", "2018/01/02", "2015/12/20")), class = "data.frame", row.names = c(NA, 
-7L))

and I want the rows where the death date is less than admidate to be removed from the new df such as this

structure(list(id2 = c(1, 3, 4, 5), deathdate2 = c("2007/04/10", 
"NA", "NA", "2016/01/02"), admidate2 = c("2007/03/08", "2012/10/20", 
"2017/10/14", "2015/12/20")), class = "data.frame", row.names = c(NA, 
-4L))

I tried this

    deathbefore <- with(df,(!is.na(deathdate))& !is.na(admidate)& deathdate < admidate)

df2 <- df[-deathbefore,]

However, it doesn't solve the problem.

Upvotes: 0

Views: 482

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389012

Change the dates to date object and select rows where deathdate > admidate or has a NA value.

library(dplyr)

df %>%
  mutate(across(contains('date'), na_if, "NA"),
        across(contains('date'), lubridate::ymd)) %>%
  filter(deathdate > admidate | is.na(deathdate) | is.na(admidate))

#  id  deathdate   admidate
#1  1 2007-04-10 2007-03-08
#2  3       <NA> 2012-10-20
#3  4       <NA> 2017-10-14
#4  5 2016-01-02 2015-12-20

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

You can use lubridate and difftime:

library(lubridate)
library(dplyr)
df %>%
  mutate(diff = difftime(ymd(admidate),ymd(deathdate)) > 0) %>%
  filter(!diff == "TRUE"|deathdate == "NA") %>%
  select(-diff)
  id  deathdate   admidate
1  1 2007/04/10 2007/03/08
2  3         NA 2012/10/20
3  4         NA 2017/10/14
4  5 2016/01/02 2015/12/20

(Warning can be neglected)

EDIT:

Your solution is not far from working, just add which:

deathbefore <- with(df,(!is.na(deathdate))& !is.na(admidate) & deathdate < admidate)

df2 <- df[-which(deathbefore),]

Data:

df <- structure(list(id = c(1, 1, 2, 3, 4, 4, 5), deathdate = c("2007/04/10", 
                                                                "2007/04/10", "2004/04/01", "NA", "NA", "2018/01/01", "2016/01/02"
), admidate = c("2007/03/08", "2007/04/11", "2004/04/15", "2012/10/20", 
                "2017/10/14", "2018/01/02", "2015/12/20")), class = "data.frame", row.names = c(NA, 
                                                                                                -7L))

Upvotes: 0

Related Questions