Tim
Tim

Reputation: 65

dplyr::mutate:- new column = difference between two comma-delimited list columns

Example that works:

df <- data.frame(c0=c(1, 2), c1=c("A,B,C", "D,E,F"), c2=c("B,C", "D,E"))
df
#   c0    c1  c2
# 1  1 A,B,C B,C
# 2  2 D,E,F D,E

# Add a column d with difference between c1 and c2
df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))

#   c0    c1  c2 d
# 1  1 A,B,C B,C A
# 2  2 D,E,F D,E F

I get what I expected above: d is assigned the difference between these two lists of characters (they are already sorted).

However, if I introduce more than one different character it no longer works:

df <- data.frame(c0=c(1, 2), c1=c("A,B,C", "D,E,F,G"), c2=c("B,C", "D,E"))
df
#   c0      c1  c2
# 1  1   A,B,C B,C
# 2  2 D,E,F,G D,E

# Add a column d with difference between c1 and c2
df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))
Error: wrong result size (3), expected 2 or 1

What I wanted to get there is:

  c0    c1    c2  d
1  1 A,B,C    B,C A
2  2 D,E,F,G  D,E F,G

I've tried adding a paste() around setdiff but that didn't help. In the end I actually want to be able to probably use tidyr::separate to split out the d column into new rows like:

  c0    c1    c2  d
1  1 A,B,C    B,C A
2  2 D,E,F,G  D,E F
3  2 D,E,F,G  D,E G

What am I doing wrong with the setdiff above?

Thanks

Tim

Upvotes: 3

Views: 1165

Answers (1)

akuiper
akuiper

Reputation: 214957

You get the error because at row 2 you have more than one element which can not fit a cell, one way is to use rowwise and wrap the result as list so that it can fit and after that use unnest from tidyr to expand the list type column:

library(dplyr)
library(tidyr)
df %>% 
      rowwise() %>% 
      mutate(d=list(setdiff(unlist(strsplit(as.character(c1), ",")), 
                            unlist(strsplit(as.character(c2), ","))))) %>% 
      unnest()

# Source: local data frame [3 x 4]

#      c0      c1     c2     d
#   <dbl>  <fctr> <fctr> <chr>
# 1     1   A,B,C    B,C     A
# 2     2 D,E,F,G    D,E     F
# 3     2 D,E,F,G    D,E     G

Upvotes: 1

Related Questions