Reputation: 65
Example that works:
df <- data.frame(c0=c(1, 2), c1=c("A,B,C", "D,E,F"), c2=c("B,C", "D,E"))
df
# c0 c1 c2
# 1 1 A,B,C B,C
# 2 2 D,E,F D,E
# Add a column d with difference between c1 and c2
df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))
# c0 c1 c2 d
# 1 1 A,B,C B,C A
# 2 2 D,E,F D,E F
I get what I expected above: d is assigned the difference between these two lists of characters (they are already sorted).
However, if I introduce more than one different character it no longer works:
df <- data.frame(c0=c(1, 2), c1=c("A,B,C", "D,E,F,G"), c2=c("B,C", "D,E"))
df
# c0 c1 c2
# 1 1 A,B,C B,C
# 2 2 D,E,F,G D,E
# Add a column d with difference between c1 and c2
df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))
Error: wrong result size (3), expected 2 or 1
What I wanted to get there is:
c0 c1 c2 d
1 1 A,B,C B,C A
2 2 D,E,F,G D,E F,G
I've tried adding a paste()
around setdiff but that didn't help. In the end I actually want to be able to probably use tidyr::separate
to split out the d column into new rows like:
c0 c1 c2 d
1 1 A,B,C B,C A
2 2 D,E,F,G D,E F
3 2 D,E,F,G D,E G
What am I doing wrong with the setdiff above?
Thanks
Tim
Upvotes: 3
Views: 1165
Reputation: 214957
You get the error because at row 2 you have more than one element which can not fit a cell, one way is to use rowwise
and wrap the result as list so that it can fit and after that use unnest
from tidyr
to expand the list type column:
library(dplyr)
library(tidyr)
df %>%
rowwise() %>%
mutate(d=list(setdiff(unlist(strsplit(as.character(c1), ",")),
unlist(strsplit(as.character(c2), ","))))) %>%
unnest()
# Source: local data frame [3 x 4]
# c0 c1 c2 d
# <dbl> <fctr> <fctr> <chr>
# 1 1 A,B,C B,C A
# 2 2 D,E,F,G D,E F
# 3 2 D,E,F,G D,E G
Upvotes: 1