Karlos Garcia
Karlos Garcia

Reputation: 61

Detect new values when comparing existing variables in a dataframe and add them in a new variable in R

col1 col2
First,Second,Other row,First
Second,Other,Other2 row,Second

I would like to create a new column with the values that are in col1 and not in col2:

col1 col2 col3
First,Second,Other row,First Second,Other
Second,Other,Other2 row,Second Other,Other2

And what if the separator is a || instead of a ,?

Upvotes: 1

Views: 35

Answers (2)

ekoam
ekoam

Reputation: 8844

Another base R approach. If you want col3 to be a list of vectors, do this

sep = ","
d$col3 <- Vectorize(setdiff, SIMPLIFY = FALSE)(
  strsplit(d$col1, sep, fixed = TRUE), 
  strsplit(d$col2, sep, fixed = TRUE)
)

Output

> d$col3
[[1]]
[1] "Second" "Other" 

[[2]]
[1] "Other"  "Other2"

> d
                 col1       col2          col3
1  First,Second,Other  row,First Second, Other
2 Second,Other,Other2 row,Second Other, Other2

If you don't, then do one more step

d$col3 <- sapply(d$col3, paste0, collapse = sep)

All together

sep = ","
d$col3 <- sapply(Vectorize(setdiff, SIMPLIFY = FALSE)(
  strsplit(d$col1, sep, fixed = TRUE), 
  strsplit(d$col2, sep, fixed = TRUE)
), paste0, collapse = sep)

Data

d <- structure(list(col1 = c("First,Second,Other", "Second,Other,Other2"
), col2 = c("row,First", "row,Second")), class = "data.frame", row.names = c(NA, 
-2L))

Upvotes: 0

zx8754
zx8754

Reputation: 56004

Loop rows, split, get the set difference, finally paste them back together again:

d$col3 <- apply(d, 1, function(i) {
  paste(setdiff(unlist(strsplit(i[ 1 ], ",")),
                unlist(strsplit(i[ 2 ], ","))), collapse = ",")})

d
#                  col1       col2         col3
# 1  First,Second,Other  row,First Second,Other
# 2 Second,Other,Other2 row,Second Other,Other2

If we want to split on "||" then apply below changes for strsplit in above code:

#example for ||
strsplit("First||Second||Other", split = "||", fixed = TRUE)
# [[1]]
# [1] "First"  "Second" "Other" 

Upvotes: 2

Related Questions