Reputation: 167
I have a dataset of questionnaires filled by patients. I want to identify them using diagnostic criteria; the criteria I'm struggling with requires at least 3 answers of >= 3 (questions are Likert questions from 1 up to 5).
A MWE of the dataset I'm working on is presented below
data <- structure(list(q1 = c(1, 2, 3, 1, 1, 1, 1, 3, 1, 1), q2 = c(1,
1, 3, 1, 1, 1, 1, 3, 1, 1), q3 = c(1, 1, 1, 1, 3, 3, 1, 1,
1, 1), q4 = c(1, 2, 2, 1, 1, 3, 1, 3, 1, 1), q5 = c(1, 1,
3, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
I've figured out how to identify observations that match at least 1 value >=3 using (I do not use all_vars as my dataset is larger than the MWE:
data.match <- data %>%
filter_at(vars(q1, q2, q3, q4, q5), any_vars(. %in% c(3:5)))
data$diagnostic <- ifelse(data$id %in% data.match$id,1,0)
I then back-identified patients using the second line. The thing is I've not been able to replicate such a strategy to identify patients meeting a determined number of pre-specified values across columns. In this specific example, I'd like to identify patients 3 and 8. I've tried using rowSums but it seems to me that the number of possible combinations is too high.
Upvotes: 1
Views: 69
Reputation: 389335
Using dplyr
, you could use rowwise
with c_across
:
library(dplyr)
result <- data %>%
rowwise() %>%
mutate(diagnostic = as.integer(sum(c_across(starts_with('q')) >= 3) >= 3))
result
# q1 q2 q3 q4 q5 diagnostic
# <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 1 1 1 1 0
# 2 2 1 1 2 1 0
# 3 3 3 1 2 3 1
# 4 1 1 1 1 1 0
# 5 1 1 3 1 1 0
# 6 1 1 3 3 1 0
# 7 1 1 1 1 1 0
# 8 3 3 1 3 1 1
# 9 1 1 1 1 1 0
#10 1 1 1 1 1 0
Upvotes: 1
Reputation: 887981
Perhaps, we can use rowSums
data$diagnostic <- +(rowSums(data >=3) == 3)
data$diagnostic
#[1] 0 0 1 0 0 0 0 1 0 0
Upvotes: 1