Nick Brown
Nick Brown

Reputation: 65

Find non-matching values across multiple columns in R

I have a table similar to this:

library(tidyverse)
tribble(~PatientID, ~A1.recipient, ~A2.recipient, ~DonorID, ~A1.donor, ~A2.donor,
  #--|---|---|---|---|---
 101, "1", "3", 201, "3", "1",
 102, "1", "2", 202, "2", "68",
 103, "68", "69", 203, "2", "3",
 104, "1", "2", 204, NA, NA,
 )

I would like to use R to find values in the two donor columns that are not in either of the two patient columns. My desired output would look something like this:

PatientID A1.recipient A2.recipient DonorID A1.donor A2.donor A1.mismatch A2.mismatch
101 1 3 201 3 1 NA NA
102 1 2 202 2 68 68 NA
103 68 69 203 2 3 2 3
104 1 2 204 NA NA NA NA

I think I need to use the across function, but I'm not sure how to do it. Solutions written in dplyr would be great, as that's what I'm trying to learn.

Upvotes: 1

Views: 1139

Answers (3)

GuedesBF
GuedesBF

Reputation: 9858

We want to extract the 'donor' value in all case_when it is not %in% the "recipient" values. We can call a function across the columns that ends_with "donor", then compare the values of each .x with each of the columns that ends_with "recipient".

Your expected output is likely wrong, I suspect you have exchanged A1 for A2 in the 'mismatch' columns.

This will work with any number of donor or recipient columns.

library(dplyr)

df %>% rowwise() %>%
        mutate(across(ends_with('donor'), ~case_when(!.x %in% c_across(ends_with('recipient')) ~ .x), .names='{.col}_mismatch'))

# A tibble: 4 x 8
# Rowwise: 
  PatientID A1.recipient A2.recipient DonorID A1.donor A2.donor A1.donor_mismatch A2.donor_mismatch
      <dbl> <chr>        <chr>          <dbl> <chr>    <chr>    <chr>             <chr>            
1       101 1            3                201 3        1        NA                NA               
2       102 1            2                202 2        68       NA                68               
3       103 68           69               203 2        3        2                 3                
4       104 1            2                204 NA       NA       NA                NA    

Upvotes: 1

akrun
akrun

Reputation: 887118

Using base R

nm1 <-  sub("donor", "mismatch", names(df1)[endsWith(names(df1), 'donor')])

df1[nm1] <- lapply(df1[endsWith(names(df1), 'donor')], function(x) 
    replace(x, Reduce(`|`, lapply(df1[endsWith(names(df1), 'recipient')], 
       function(y) x == y)), NA))

-output

 df1
# A tibble: 4 x 8
  PatientID A1.recipient A2.recipient DonorID A1.donor A2.donor A1.mismatch A2.mismatch
      <dbl> <chr>        <chr>          <dbl> <chr>    <chr>    <chr>       <chr>      
1       101 1            3                201 3        1        <NA>        <NA>       
2       102 1            2                202 2        68       <NA>        68         
3       103 68           69               203 2        3        2           3          
4       104 1            2                204 <NA>     <NA>     <NA>        <NA>    

Upvotes: 1

TarJae
TarJae

Reputation: 78927

df %>% 
    mutate(A1.mismatch = case_when(A1.donor == A1.recipient | A1.donor == A2.recipient ~ NA_integer_,
                                            TRUE ~ as.integer(A1.donor))) %>% 
    mutate(A2.mismatch = case_when(A2.donor == A1.recipient | A2.donor == A2.recipient ~ NA_integer_,
                                   TRUE ~ as.integer(A2.donor)))

Upvotes: 2

Related Questions