Reputation: 165
I have a dataset in which each individual has 3 possible readings of systolic blood pressure (SBP) and 3 possible readings of diastolic blood pressure (DBP):
a = data.frame(
ID = c(1:10),
SBP1 = c(120, 121, 122, as.numeric(NA), 123, 124, 145, as.numeric(NA), 101, 110),
SBP2 = c(134, 124, as.numeric(NA), as.numeric(NA), 102, 133, 123, as.numeric(NA), as.numeric(NA), 109),
SBP3 = c(111, 123, as.numeric(NA), as.numeric(NA), as.numeric(NA), 133, 132, 111, 110, 123),
DBP1 = c(89, 90, 87, as.numeric(NA), 65, 98, 80, as.numeric(NA), 66, 65),
DBP2 = c(90, 92, as.numeric(NA), as.numeric(NA), 65, 78, 88, as.numeric(NA), as.numeric(NA), 91),
DBP3 = c(91, 93, as.numeric(NA), as.numeric(NA), as.numeric(NA), 92, 78, 88, 88, 54)
)
I would like to create two new variables (one for the SBP called 'SBP_new', and the other for the DBP called 'DBP_new') using the following rules:
I can subset my dataset into 4 subsets and then do the calculation in each individually then combine.
But is there a more efficient way to do this?
Upvotes: 1
Views: 70
Reputation: 76402
Like @Ritchie Sacramento says in his comment to the question, compute the median for all cases. But remove NA
's depending on whether or not all values are NA
.
i_sbp <- grep("SBP", names(a))
i_dbp <- grep("DBP", names(a))
a$SBP_new <- apply(a[i_sbp], 1, \(x) median(x, na.rm = any(!is.na(x))))
a$DBP_new <- apply(a[i_dbp], 1, \(x) median(x, na.rm = any(!is.na(x))))
Created on 2022-05-29 by the reprex package (v2.0.1)
Upvotes: 1