Guillaume
Guillaume

Reputation: 167

Is there a way to identify rows that match a condition several times across several columns in R?

I have a dataset of questionnaires filled by patients. I want to identify them using diagnostic criteria; the criteria I'm struggling with requires at least 3 answers of >= 3 (questions are Likert questions from 1 up to 5).

A MWE of the dataset I'm working on is presented below

data <- structure(list(q1 = c(1, 2, 3, 1, 1, 1, 1, 3, 1, 1), q2 = c(1, 
 1, 3, 1, 1, 1, 1, 3, 1, 1), q3 = c(1, 1, 1, 1, 3, 3, 1, 1, 
 1, 1), q4 = c(1, 2, 2, 1, 1, 3, 1, 3, 1, 1), q5 = c(1, 1, 
 3, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df", 
 "tbl", "data.frame"))

I've figured out how to identify observations that match at least 1 value >=3 using (I do not use all_vars as my dataset is larger than the MWE:

data.match <- data %>%
   filter_at(vars(q1, q2, q3, q4, q5), any_vars(. %in% c(3:5)))
data$diagnostic <- ifelse(data$id %in% data.match$id,1,0) 

I then back-identified patients using the second line. The thing is I've not been able to replicate such a strategy to identify patients meeting a determined number of pre-specified values across columns. In this specific example, I'd like to identify patients 3 and 8. I've tried using rowSums but it seems to me that the number of possible combinations is too high.

Upvotes: 1

Views: 69

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389335

Using dplyr, you could use rowwise with c_across :

library(dplyr)

result <- data %>%
  rowwise() %>%
  mutate(diagnostic = as.integer(sum(c_across(starts_with('q')) >= 3) >= 3)) 

result

#      q1    q2    q3    q4    q5 diagnostic
#   <dbl> <dbl> <dbl> <dbl> <dbl>      <int>
# 1     1     1     1     1     1          0
# 2     2     1     1     2     1          0
# 3     3     3     1     2     3          1
# 4     1     1     1     1     1          0
# 5     1     1     3     1     1          0
# 6     1     1     3     3     1          0
# 7     1     1     1     1     1          0
# 8     3     3     1     3     1          1
# 9     1     1     1     1     1          0
#10     1     1     1     1     1          0

Upvotes: 1

akrun
akrun

Reputation: 887981

Perhaps, we can use rowSums

data$diagnostic <- +(rowSums(data >=3) == 3)
data$diagnostic
#[1] 0 0 1 0 0 0 0 1 0 0

Upvotes: 1

Related Questions