Reputation: 115
I have a data frame about whether a patient meets the study criteria, and each row is a patient, each column is a criterion. So some columns are inclusion criteria and some columns are exclusion criteria, and I want to output the reasons for ineligibility. For example,
test <- data.frame(A = c(0, 0, 1),
B = c(0, 0, 0),
C = c(0, 1, 1),
D = c(1, 0, 0),
E = c(1, 0, 1))
where A, B, C are inclusion criteria and D, E are exclusion criteria, and I want to output the column names (could be more than one) if the inclusion criteria == 0 or exclusion criteria == 1.
The expected output would be
output <- data.frame(A = c(0, 0, 1),
B = c(0, 0, 0),
C = c(0, 1, 1),
D = c(1, 0, 0),
E = c(1, 0, 1),
failed_incl = c("A, B, C", "A, B", "B"),
failed_excl = c("D, E", "", "E"))
Is there a way to do it efficiently without having to write out every possible scenario? The actual data frame has much more columns.
Upvotes: 1
Views: 632
Reputation: 887068
There are multiply ways. An option is to use apply
to loop over the rows (MARGIN = 1
), get the names
of the logical vector (x== 0
) and paste
them together
test$failed_incl <- apply(test[1:3], 1, function(x) toString(names(x)[x == 0]))
test$failed_excl <- apply(test[4:5], 1, function(x) toString(names(x)[x == 1]))
-output
test
# A B C D E failed_incl failed_excl
#1 0 0 0 1 1 A, B, C D, E
#2 0 0 1 0 0 A, B
#3 1 0 1 0 1 B E
Or using tidyverse
library(dplyr)
test %>%
rowwise %>%
mutate(failed_incl = toString(names(.)[which(c_across(A:C) == 0)]),
failed_excl = toString(c('D', 'E')[which(c_across(D:E) == 1)])) %>%
ungroup
# A tibble: 3 x 7
# A B C D E failed_incl failed_excl
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#1 0 0 0 1 1 A, B, C "D, E"
#2 0 0 1 0 0 A, B ""
#3 1 0 1 0 1 B "E"
Upvotes: 1
Reputation: 21264
Here's a tidyverse approach, which pivots the column names and then summarizes based on the inclusion/exclusion conditions:
failed_df <-
test %>%
add_rownames() %>%
pivot_longer(-rowname) %>%
group_by(rowname) %>%
summarise(failed_incl = paste(name[value == 0 & name %in% c("A", "B", "C")], collapse = ", "),
failed_excl = paste(name[value == 1 & name %in% c("D", "E")], collapse = ", ")) %>%
select(-rowname)
bind_cols(test, failed_df)
A B C D E failed_incl failed_excl
1 0 0 0 1 1 A, B, C D, E
2 0 0 1 0 0 A, B
3 1 0 1 0 1 B E
There may be a more elegant way to do this with rowwise
and c_across
.
Upvotes: 0