tacrolimus
tacrolimus

Reputation: 530

filtering out rows of data using dates within each row using R

Representative dataset:

ID,drug_start,date_of_bloods,result
1234,NA,12-10-2010,80
1234,NA,12-10-2011,50
1234,NA,12-10-2013,10
4532,05-03-2015,01-01-2013,80
4532,05-03-2015,01-01-2014,60
4532,05-03-2015,01-01-2016,40
7894,04-09-2016,03-08-2012,40
7894,04-09-2016,03-08-2014,38
7894,04-09-2016,02-06-2015,30
7894,04-09-2016,29-10-2016,27
7894,04-09-2016,10-10-2017,26

I would like to filter this to result >20 and those rows where the date_of_bloods is < drug_start within each row. Where NA is in the drug_start column I would like them just filtered on result >20. Desired outcome would be:

ID,drug_start,date_of_bloods,result
1234,NA,12-10-2010,80
1234,NA,12-10-2011,50
4532,05-03-2015,01-01-2013,80
4532,05-03-2015,01-01-2014,60
7894,04-09-2016,03-08-2012,40
7894,04-09-2016,03-08-2014,38
7894,04-09-2016,02-06-2015,30

So far I have tried :

table <- readxl::read_excel("data.xlsx", sheet = "bloods")
table <- as.data.frame(table)
table$result <- as.numeric(table$result)
table<- subset(table, result >20)
table$date_of_bloods <- as.Date(table$date_of_bloods, format="%d/%m/%Y")
table$drug_start <- as.Date(table$drug_start, format="%d/%m/%Y")
table[!(table$drug_start >= table$date_of_bloods),]

The final line does not work and just turns all the rows where drug_start is NA to all columns reading NA and the dates not having filtered.

Your help would be greatly welcomed

Upvotes: 0

Views: 93

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

You can convert the dates to actual and date and use subset to select rows where result is greater than 20 and date_of_bloods < drug_start or drug_start is NA.

table$date_of_bloods <- as.Date(table$date_of_bloods, format="%d-%m-%Y")
table$drug_start <- as.Date(table$drug_start, format="%d-%m-%Y")

subset(table, result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))

#    ID drug_start date_of_bloods result
#1 1234       <NA>     2010-10-12     80
#2 1234       <NA>     2011-10-12     50
#4 4532 2015-03-05     2013-01-01     80
#5 4532 2015-03-05     2014-01-01     60
#7 7894 2016-09-04     2012-08-03     40
#8 7894 2016-09-04     2014-08-03     38
#9 7894 2016-09-04     2015-06-02     30

The same logic using dplyr :

library(dplyr)

table %>%
  mutate(across(c(drug_start, date_of_bloods), lubridate::dmy)) %>%
  filter(result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))

Upvotes: 2

Related Questions