Row-wise comparisons of list columns

Question

Given a dataframe dat with two list columns, composed of character vectors, I would like to use mutate() to create a new column that contains elements of copub that are not present in secondary_report_ids.

I've failed thus far to find a solution using purrr::map() to apply setdiff() in a row-wise fashion.

dat <- structure(list(unique_study_id = c("13", "21", "3", "2", "78"
    ), srdr_id = c("174212", "172787", "174230", "174200", "174408"
    ), secondary_report_ids = list("174299", NA_character_, c("174081", 
    "174817", "174804", "172844", "172845"), c("175114", "174839", 
    "174240"), c("174094", "172575")), copub = list(c("174299", "174202", 
    "174283"), c("172567", "172566", "172621"), c("174817", "174804", 
    "172844", "172845", "174081", "174080", "174079"), c("172501", 
    "172961", "174564", "175114", "172498", "174839", "174240"), 
        c("172575", "174094"))), class = c("spec_tbl_df", "tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -5L))

Ronak Shah · Accepted Answer

We can use map2_chr from purrr

library(dplyr)
library(purrr)

dat %>%
  mutate(new_col = map2_chr(copub, secondary_report_ids, ~toString(setdiff(.x, .y))))

# unique_study_id srdr_id secondary_report_ids copub     new_col                       
#                                                        
#1 13              174212           174202, 174283                
#2 21              172787           172567, 172566, 172621        
#3 3               174230           174080, 174079                
#4 2               174200           172501, 172961, 174564, 172498
#5 78              174408           ""

The above gives a single comma-separated string for every row.

If you want the final output to be another list, we can just use map2

dat %>%
  mutate(new_col = map2(copub, secondary_report_ids, setdiff))

# unique_study_id srdr_id secondary_report_ids copub     new_col  
#                                     
#1 13              174212               
#2 21              172787               
#3 3               174230               
#4 2               174200               
#5 78              174408

In base R we can use mapply

mapply(function(x, y) toString(setdiff(x, y)), dat$copub, dat$secondary_report_ids)

and

mapply(setdiff, dat$copub, dat$secondary_report_ids)

Row-wise comparisons of list columns

Answers (2)

Related Questions