Reputation: 530
Representative dataset:
ID,drug_start,date_of_bloods,result
1234,NA,12-10-2010,80
1234,NA,12-10-2011,50
1234,NA,12-10-2013,10
4532,05-03-2015,01-01-2013,80
4532,05-03-2015,01-01-2014,60
4532,05-03-2015,01-01-2016,40
7894,04-09-2016,03-08-2012,40
7894,04-09-2016,03-08-2014,38
7894,04-09-2016,02-06-2015,30
7894,04-09-2016,29-10-2016,27
7894,04-09-2016,10-10-2017,26
I would like to filter this to result >20 and those rows where the date_of_bloods is < drug_start within each row. Where NA is in the drug_start column I would like them just filtered on result >20. Desired outcome would be:
ID,drug_start,date_of_bloods,result
1234,NA,12-10-2010,80
1234,NA,12-10-2011,50
4532,05-03-2015,01-01-2013,80
4532,05-03-2015,01-01-2014,60
7894,04-09-2016,03-08-2012,40
7894,04-09-2016,03-08-2014,38
7894,04-09-2016,02-06-2015,30
So far I have tried :
table <- readxl::read_excel("data.xlsx", sheet = "bloods")
table <- as.data.frame(table)
table$result <- as.numeric(table$result)
table<- subset(table, result >20)
table$date_of_bloods <- as.Date(table$date_of_bloods, format="%d/%m/%Y")
table$drug_start <- as.Date(table$drug_start, format="%d/%m/%Y")
table[!(table$drug_start >= table$date_of_bloods),]
The final line does not work and just turns all the rows where drug_start is NA to all columns reading NA and the dates not having filtered.
Your help would be greatly welcomed
Upvotes: 0
Views: 93
Reputation: 388817
You can convert the dates to actual and date and use subset
to select rows where result
is greater than 20 and date_of_bloods < drug_start
or drug_start
is NA
.
table$date_of_bloods <- as.Date(table$date_of_bloods, format="%d-%m-%Y")
table$drug_start <- as.Date(table$drug_start, format="%d-%m-%Y")
subset(table, result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))
# ID drug_start date_of_bloods result
#1 1234 <NA> 2010-10-12 80
#2 1234 <NA> 2011-10-12 50
#4 4532 2015-03-05 2013-01-01 80
#5 4532 2015-03-05 2014-01-01 60
#7 7894 2016-09-04 2012-08-03 40
#8 7894 2016-09-04 2014-08-03 38
#9 7894 2016-09-04 2015-06-02 30
The same logic using dplyr
:
library(dplyr)
table %>%
mutate(across(c(drug_start, date_of_bloods), lubridate::dmy)) %>%
filter(result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))
Upvotes: 2