Reputation: 65
I have a table similar to this:
library(tidyverse)
tribble(~PatientID, ~A1.recipient, ~A2.recipient, ~DonorID, ~A1.donor, ~A2.donor,
#--|---|---|---|---|---
101, "1", "3", 201, "3", "1",
102, "1", "2", 202, "2", "68",
103, "68", "69", 203, "2", "3",
104, "1", "2", 204, NA, NA,
)
I would like to use R to find values in the two donor columns that are not in either of the two patient columns. My desired output would look something like this:
PatientID | A1.recipient | A2.recipient | DonorID | A1.donor | A2.donor | A1.mismatch | A2.mismatch |
---|---|---|---|---|---|---|---|
101 | 1 | 3 | 201 | 3 | 1 | NA | NA |
102 | 1 | 2 | 202 | 2 | 68 | 68 | NA |
103 | 68 | 69 | 203 | 2 | 3 | 2 | 3 |
104 | 1 | 2 | 204 | NA | NA | NA | NA |
I think I need to use the across
function, but I'm not sure how to do it. Solutions written in dplyr would be great, as that's what I'm trying to learn.
Upvotes: 1
Views: 1139
Reputation: 9858
We want to extract the 'donor' value in all case_when
it is not %in%
the "recipient" values. We can call a function across
the columns that ends_with
"donor", then compare the values of each .x
with each of the columns that ends_with
"recipient".
Your expected output is likely wrong, I suspect you have exchanged A1 for A2 in the 'mismatch' columns.
This will work with any number of donor or recipient columns.
library(dplyr)
df %>% rowwise() %>%
mutate(across(ends_with('donor'), ~case_when(!.x %in% c_across(ends_with('recipient')) ~ .x), .names='{.col}_mismatch'))
# A tibble: 4 x 8
# Rowwise:
PatientID A1.recipient A2.recipient DonorID A1.donor A2.donor A1.donor_mismatch A2.donor_mismatch
<dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 101 1 3 201 3 1 NA NA
2 102 1 2 202 2 68 NA 68
3 103 68 69 203 2 3 2 3
4 104 1 2 204 NA NA NA NA
Upvotes: 1
Reputation: 887118
Using base R
nm1 <- sub("donor", "mismatch", names(df1)[endsWith(names(df1), 'donor')])
df1[nm1] <- lapply(df1[endsWith(names(df1), 'donor')], function(x)
replace(x, Reduce(`|`, lapply(df1[endsWith(names(df1), 'recipient')],
function(y) x == y)), NA))
-output
df1
# A tibble: 4 x 8
PatientID A1.recipient A2.recipient DonorID A1.donor A2.donor A1.mismatch A2.mismatch
<dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 101 1 3 201 3 1 <NA> <NA>
2 102 1 2 202 2 68 <NA> 68
3 103 68 69 203 2 3 2 3
4 104 1 2 204 <NA> <NA> <NA> <NA>
Upvotes: 1
Reputation: 78927
df %>%
mutate(A1.mismatch = case_when(A1.donor == A1.recipient | A1.donor == A2.recipient ~ NA_integer_,
TRUE ~ as.integer(A1.donor))) %>%
mutate(A2.mismatch = case_when(A2.donor == A1.recipient | A2.donor == A2.recipient ~ NA_integer_,
TRUE ~ as.integer(A2.donor)))
Upvotes: 2