Reputation: 2253
I am looking for a function to repeat the following exercise. I have a large dataset p
in which each row correspond to a patient.
Each patient has dates corresponding to multiple MRI scans and dates corresponding to multiple clinical follow-ups. Each patient may have up to 20 follow-ups of each follow-up type.
The following column names are autogenerated in REDCap, and I admit they are unnecessarily long.
MRI Dates
p$mr_daterd
correspond to the first MRI scan used at diagnosis, and
each subsequent MRI scan is denoted as p$mr_daterd_fu1_v1
(first MRI follow-up), p$mr_daterd_fu1_v2
(second MRI follow-up), p$mr_daterd_v2_v3
(third MRI follow-up) ...
p$mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20
(the 20th MRI follow-up)
Clinical follow-up
p$date_contact_1
correspond to the first clinical follow-up, and
each subsequent clinical follow-up is denoted as p$contact_date_2
(second clinical follow-up), p$contact_date_3
(third clinical follow-up) ...
p$contact_date_20
(the 20th clinical follow-up).
Hence the suffix _vX
and _x
matches sequentially between each MRI and clinical follow-up.
I need to replace NA / missing dates in the clinical follow-up, exclusively. I.e. if p$date_contact_x
is NA / missing, it should be replaced with the corresponding xth
MRI follow-up date.
Ala p$date_contact_x = ifelse(is.na(p$date_contact_x), p$mr_daterd_vx, p$date_contact_x)
.
However, I would prefer not to define each follow-up date as such, and rather include a function to do this repeatedly for each follow-up.
I am using dplyr
, so a solution compatible with this package is preferable.
Example
id date_contact_3 date_contact_7 mr_daterd_fu1_v2_v3 mr_daterd_fu1_v2_v3_v4_v5_v6_v7
1 10 <NA> <NA> <NA> 2009-03-16
2 14 <NA> <NA> 2012-03-09 <NA>
Expected output
id date_contact_3 date_contact_7 mr_daterd_fu1_v2_v3 mr_daterd_fu1_v2_v3_v4_v5_v6_v7
1 10 <NA> 2009-03-16 <NA> 2009-03-16
2 14 2012-03-09 <NA> 2012-03-09 <NA>
Data sample including all 20 MRI and clinical follow-ups for nine patients
p <- structure(list(id = 32:40, mr_daterd = structure(c(15271, 12958,
15236, 12467, 12958, 15125, 12958, 11541, 13696), class = "Date"),
mr_daterd_fu1_v1 = structure(c(15716, 15785, 15391, 16307,
15764, 15474, 15932, 11765, 13976), class = "Date"), mr_daterd_fu1_v2 = structure(c(16086,
16504, 15758, NA, 16602, 15836, 16652, 12169, 14389), class = "Date"),
mr_daterd_fu1_v2_v3 = structure(c(16451, NA, 16097, NA, NA,
16209, NA, 12538, 14821), class = "Date"), mr_daterd_fu1_v2_v3_v4 = structure(c(17323,
NA, 16511, NA, NA, 16564, NA, 12888, 15146), class = "Date"),
mr_daterd_fu1_v2_v3_v4_v5 = structure(c(18130, NA, 17974,
NA, NA, 17365, NA, 13241, 15496), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 13732, 16232), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7 = structure(c(NA,
NA, NA, NA, NA, NA, NA, NA, 17308), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 15243, NA), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 15693, NA), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 16421, NA), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_1 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_2 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_3 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_4 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_5 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_6 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_7 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_8 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_9 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_10 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_11 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_12 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_13 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_14 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_15 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_16 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_17 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_18 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_19 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_20 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date")), row.names = c(NA,
-9L), class = "data.frame")
Upvotes: 0
Views: 39
Reputation: 8844
Here is a dplyr
approach
library(dplyr)
library(stringr)
p %>%
mutate(across(
starts_with("date_contact"),
~{
vn <- paste0("v", str_match(cur_column(), "(?<=_)\\d+$"))
x <- cur_data_all() %>% select(starts_with("mr_daterd") & ends_with(vn)) %>% pull()
coalesce(., x)
}
))
Output
id mr_daterd mr_daterd_fu1_v1 mr_daterd_fu1_v2 mr_daterd_fu1_v2_v3 mr_daterd_fu1_v2_v3_v4 mr_daterd_fu1_v2_v3_v4_v5 mr_daterd_fu1_v2_v3_v4_v5_v6 mr_daterd_fu1_v2_v3_v4_v5_v6_v7 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 date_contact_1 date_contact_2 date_contact_3 date_contact_4 date_contact_5 date_contact_6 date_contact_7 date_contact_8 date_contact_9 date_contact_10 date_contact_11 date_contact_12 date_contact_13 date_contact_14 date_contact_15 date_contact_16 date_contact_17 date_contact_18 date_contact_19 date_contact_20
1 32 2011-10-24 2013-01-11 2014-01-16 2015-01-16 2017-06-06 2019-08-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-01-11 2014-01-16 2015-01-16 2017-06-06 2019-08-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 33 2005-06-24 2013-03-21 2015-03-10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-03-21 2015-03-10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 34 2011-09-19 2012-02-21 2013-02-22 2014-01-27 2015-03-17 2019-03-19 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2012-02-21 2013-02-22 2014-01-27 2015-03-17 2019-03-19 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 35 2004-02-19 2014-08-25 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2014-08-25 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 36 2005-06-24 2013-02-28 2015-06-16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-02-28 2015-06-16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 37 2011-05-31 2012-05-14 2013-05-11 2014-05-19 2015-05-09 2017-07-18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2012-05-14 2013-05-11 2014-05-19 2015-05-09 2017-07-18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 38 2005-06-24 2013-08-15 2015-08-05 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-08-15 2015-08-05 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 39 2001-08-07 2002-03-19 2003-04-27 2004-04-30 2005-04-15 2006-04-03 2007-08-07 <NA> 2011-09-26 2012-12-19 2014-12-17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2002-03-19 2003-04-27 2004-04-30 2005-04-15 2006-04-03 2007-08-07 <NA> 2011-09-26 2012-12-19 2014-12-17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 40 2007-07-02 2008-04-07 2009-05-25 2010-07-31 2011-06-21 2012-06-05 2014-06-11 2017-05-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2008-04-07 2009-05-25 2010-07-31 2011-06-21 2012-06-05 2014-06-11 2017-05-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Upvotes: 1