Reputation: 71
How can I put the diff elements b/w factor_Nov and factor_Jan in a new column called diff
df=data.frame(id=c("1","2","3"),
factor_Nov=c("A|B|C","E","F|H|G"),
factor_Jan=c("B|H|E","E","X|Y|Z"))
the output should be
df=data.frame(id=c("1","2","3"),
factor_Nov=c("A|B|C","E","F|H|G"),
factor_Jan=c("B|H|E","E","X|Y|Z"),
diff=c("A|C|H|E",NA,"X|Y|Z|F|H|G"))
I tried setdiff but that wasn't working
Upvotes: 2
Views: 51
Reputation: 102241
A data.table
option
setDT(df)[
,
diff := do.call(
Map,
c(
function(...) paste0(setdiff(union(...), intersect(...)), collapse = "|"),
unname(lapply(.SD, strsplit, split = "\\|"))
)
),
.SDcols = patterns("^factor_")
]
gives
> df
id factor_Nov factor_Jan diff
1: 1 A|B|C B|H|E A|C|H|E
2: 2 E E
3: 3 F|H|G X|Y|Z F|H|G|X|Y|Z
Upvotes: 0
Reputation: 39613
With tidyverse
:
library(dplyr)
library(tidyr)
#Code
new <- df %>% left_join(
df %>% separate_rows(c(factor_Nov,factor_Jan)) %>%
pivot_longer(-id) %>%
group_by(id,value) %>%
filter(n() == 1) %>%
ungroup() %>% arrange(id,value) %>%
group_by(id) %>%
summarise(Diff=paste0(value,collapse = '|')))
Output:
id factor_Nov factor_Jan Diff
1 1 A|B|C B|H|E A|C|E|H
2 2 E E <NA>
3 3 F|H|G X|Y|Z F|G|H|X|Y|Z
Upvotes: 1
Reputation: 887501
An option is to split the columns with strsplit
, using delimiter as |
, then use Map
to get the elements that are not intersect
, paste
them with collapse = "|"
df$diff <- unlist(Map(function(x, y) paste(setdiff(union(x, y),
intersect(x, y)), collapse="|"),
strsplit(as.character(df$factor_Nov), "|", fixed = TRUE),
strsplit(as.character(df$factor_Jan), "|", fixed = TRUE)))
Upvotes: 2