Reputation: 345
I have some data such as this
structure(list(id = c(1, 1, 2, 3, 4, 4, 5), deathdate = c("2007/04/10",
"2007/04/10", "2004/04/01", "NA", "NA", "2018/01/01", "2016/01/02"
), admidate = c("2007/03/08", "2007/04/11", "2004/04/15", "2012/10/20",
"2017/10/14", "2018/01/02", "2015/12/20")), class = "data.frame", row.names = c(NA,
-7L))
and I want the rows where the death date is less than admidate to be removed from the new df such as this
structure(list(id2 = c(1, 3, 4, 5), deathdate2 = c("2007/04/10",
"NA", "NA", "2016/01/02"), admidate2 = c("2007/03/08", "2012/10/20",
"2017/10/14", "2015/12/20")), class = "data.frame", row.names = c(NA,
-4L))
I tried this
deathbefore <- with(df,(!is.na(deathdate))& !is.na(admidate)& deathdate < admidate)
df2 <- df[-deathbefore,]
However, it doesn't solve the problem.
Upvotes: 0
Views: 482
Reputation: 389012
Change the dates to date object and select rows where deathdate > admidate
or has a NA
value.
library(dplyr)
df %>%
mutate(across(contains('date'), na_if, "NA"),
across(contains('date'), lubridate::ymd)) %>%
filter(deathdate > admidate | is.na(deathdate) | is.na(admidate))
# id deathdate admidate
#1 1 2007-04-10 2007-03-08
#2 3 <NA> 2012-10-20
#3 4 <NA> 2017-10-14
#4 5 2016-01-02 2015-12-20
Upvotes: 1
Reputation: 21400
You can use lubridate
and difftime
:
library(lubridate)
library(dplyr)
df %>%
mutate(diff = difftime(ymd(admidate),ymd(deathdate)) > 0) %>%
filter(!diff == "TRUE"|deathdate == "NA") %>%
select(-diff)
id deathdate admidate
1 1 2007/04/10 2007/03/08
2 3 NA 2012/10/20
3 4 NA 2017/10/14
4 5 2016/01/02 2015/12/20
(Warning can be neglected)
EDIT:
Your solution is not far from working, just add which
:
deathbefore <- with(df,(!is.na(deathdate))& !is.na(admidate) & deathdate < admidate)
df2 <- df[-which(deathbefore),]
Data:
df <- structure(list(id = c(1, 1, 2, 3, 4, 4, 5), deathdate = c("2007/04/10",
"2007/04/10", "2004/04/01", "NA", "NA", "2018/01/01", "2016/01/02"
), admidate = c("2007/03/08", "2007/04/11", "2004/04/15", "2012/10/20",
"2017/10/14", "2018/01/02", "2015/12/20")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 0