Mel
Mel

Reputation: 71

Find the difference b/w two column elements in r

How can I put the diff elements b/w factor_Nov and factor_Jan in a new column called diff

 df=data.frame(id=c("1","2","3"),
                     factor_Nov=c("A|B|C","E","F|H|G"),
                     factor_Jan=c("B|H|E","E","X|Y|Z"))

the output should be

df=data.frame(id=c("1","2","3"),
                 factor_Nov=c("A|B|C","E","F|H|G"),
                 factor_Jan=c("B|H|E","E","X|Y|Z"),
                diff=c("A|C|H|E",NA,"X|Y|Z|F|H|G"))

I tried setdiff but that wasn't working

Upvotes: 2

Views: 51

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102241

A data.table option

setDT(df)[
  ,
  diff := do.call(
    Map,
    c(
      function(...) paste0(setdiff(union(...), intersect(...)), collapse = "|"),
      unname(lapply(.SD, strsplit, split = "\\|"))
    )
  ),
  .SDcols = patterns("^factor_")
]

gives

> df
   id factor_Nov factor_Jan        diff
1:  1      A|B|C      B|H|E     A|C|H|E
2:  2          E          E
3:  3      F|H|G      X|Y|Z F|H|G|X|Y|Z

Upvotes: 0

Duck
Duck

Reputation: 39613

With tidyverse:

library(dplyr)
library(tidyr)
#Code
new <- df %>% left_join(
  df %>% separate_rows(c(factor_Nov,factor_Jan)) %>%
  pivot_longer(-id) %>%
  group_by(id,value) %>%
  filter(n() == 1) %>%
  ungroup() %>% arrange(id,value) %>%
  group_by(id) %>%
  summarise(Diff=paste0(value,collapse = '|')))

Output:

  id factor_Nov factor_Jan        Diff
1  1      A|B|C      B|H|E     A|C|E|H
2  2          E          E        <NA>
3  3      F|H|G      X|Y|Z F|G|H|X|Y|Z

Upvotes: 1

akrun
akrun

Reputation: 887501

An option is to split the columns with strsplit, using delimiter as |, then use Map to get the elements that are not intersect, paste them with collapse = "|"

df$diff <- unlist(Map(function(x, y) paste(setdiff(union(x, y), 
   intersect(x, y)), collapse="|"),
   strsplit(as.character(df$factor_Nov), "|", fixed = TRUE),
   strsplit(as.character(df$factor_Jan), "|", fixed = TRUE)))

Upvotes: 2

Related Questions