Reputation: 61
col1 | col2 |
---|---|
First,Second,Other | row,First |
Second,Other,Other2 | row,Second |
I would like to create a new column with the values that are in col1 and not in col2:
col1 | col2 | col3 |
---|---|---|
First,Second,Other | row,First | Second,Other |
Second,Other,Other2 | row,Second | Other,Other2 |
And what if the separator is a ||
instead of a ,
?
Upvotes: 1
Views: 35
Reputation: 8844
Another base R approach. If you want col3
to be a list of vectors, do this
sep = ","
d$col3 <- Vectorize(setdiff, SIMPLIFY = FALSE)(
strsplit(d$col1, sep, fixed = TRUE),
strsplit(d$col2, sep, fixed = TRUE)
)
Output
> d$col3
[[1]]
[1] "Second" "Other"
[[2]]
[1] "Other" "Other2"
> d
col1 col2 col3
1 First,Second,Other row,First Second, Other
2 Second,Other,Other2 row,Second Other, Other2
If you don't, then do one more step
d$col3 <- sapply(d$col3, paste0, collapse = sep)
All together
sep = ","
d$col3 <- sapply(Vectorize(setdiff, SIMPLIFY = FALSE)(
strsplit(d$col1, sep, fixed = TRUE),
strsplit(d$col2, sep, fixed = TRUE)
), paste0, collapse = sep)
Data
d <- structure(list(col1 = c("First,Second,Other", "Second,Other,Other2"
), col2 = c("row,First", "row,Second")), class = "data.frame", row.names = c(NA,
-2L))
Upvotes: 0
Reputation: 56004
Loop rows, split, get the set difference, finally paste them back together again:
d$col3 <- apply(d, 1, function(i) {
paste(setdiff(unlist(strsplit(i[ 1 ], ",")),
unlist(strsplit(i[ 2 ], ","))), collapse = ",")})
d
# col1 col2 col3
# 1 First,Second,Other row,First Second,Other
# 2 Second,Other,Other2 row,Second Other,Other2
If we want to split on "||"
then apply below changes for strsplit in above code:
#example for ||
strsplit("First||Second||Other", split = "||", fixed = TRUE)
# [[1]]
# [1] "First" "Second" "Other"
Upvotes: 2