ccalle
ccalle

Reputation: 53

Mutate with grepl over multiple columns with operations

I need to create a new variable based on columns that depend on each other but can't seem to get it to work. I have one way of solving this but this would mean writing the same code 259 times.

I want to create a new variable which includes the date of the latest prescription of certain drugs if its within 180 days of hospital admission. The variable date_prescription_n is the date of prescription for medication_n. There is only one date for hospital admission, the hosp_date variable. The dataframe is in a long format.

My solution:

drugvector <- c("drug1", "drug2", "drug3", "drug4")
dataframe <- dataframe %>%
             mutate(latest_prescription_of_drugvector = case_when(
grepl(paste(drugvector, collapse = "|"), medication_1) & hosp_date - date_prescription1 >181) ~ date_prescription1,
grepl(paste(drugvector, collapse = "|"), medication_2) & hosp_date - date_prescription2 >181) ~ date_prescription2,
grepl(paste(drugvector, collapse = "|"), medication_3) & hosp_date - date_prescription3 >181) ~ date_prescription3))

The problem is that I have medication_1 to medication_259 and date_prescription1 to date_prescription259. Luckily _1 is the earliest medication/prescription and 259 is the latest one which means if I just copy and paste the code above until I get 259 lines I should have the date of the latest drug prescriped. This however seems by far one of the most non-efficient ways of doing so.

Expected output:

> dataframe
   hosp_date             medication_1      medication_2         medication_3        date_prescription1       date_prescription2     date_prescription3       latest_prescription_of_drugvector
1 2014-03-27              drug1                drug5                drug7                    2014-01-25              2014-02-20         2014-02-25               2014-01-25
2 2015-03-21              drug6                drug8                drug3                    2014-05-27              2014-08-21         2015-01-22                2015-01-22
3 2017-12-21              drug3                drug2                drug7                    2012-01-26              2012-03-12         2013-01-27                   NA

Upvotes: 1

Views: 429

Answers (1)

akrun
akrun

Reputation: 887691

We may reshape to 'long' format with pivot_longer and then create the column and bind it with original dataset

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
  mutate(rn = row_number()) %>%
  rename_with(~str_remove(., "_(?=\\d+)")) %>% 
  pivot_longer(cols = -c(rn, hosp_date), names_to = c(".value", 
       "grp"), names_sep = "(?<=[a-z])(?=[0-9])") %>% 
  mutate(across(c(hosp_date, date_prescription), as.Date)) %>% 
  filter(medication %in% drugvector) %>%
  group_by(grp) %>%
  summarise(latest_prescription_of_drugvector =  
   case_when(hosp_date - date_prescription >181 ~ date_prescription), .groups = 'keep') %>%
  summarise(latest_prescription_of_drugvector = if(all(is.na(latest_prescription_of_drugvector))) first(latest_prescription_of_drugvector) else 
  latest_prescription_of_drugvector[!is.na(latest_prescription_of_drugvector)][1]) %>% select(latest_prescription_of_drugvector) %>% bind_cols(df1, .)

data

df1 <- structure(list(hosp_date = c("2014-03-27", "2015-03-21", "2017-12-21"
), medication_1 = c("drug1", "drug6", "drug3"), medication_2 = c("drug5", 
"drug8", "drug2"), medication_3 = c("drug7", "drug3", "drug7"
), date_prescription1 = c("2014-01-25", "2014-05-27", "2012-01-26"
), date_prescription2 = c("2014-02-20", "2014-08-21", "2012-03-12"
), date_prescription3 = c("2014-02-25", "2015-01-22", "2013-01-27"
)), row.names = c("1", "2", "3"), class = "data.frame")

Upvotes: 1

Related Questions