Maya
Maya

Reputation: 55

R - Extract portions of matching and non matching strings

I need to extract portions of strings that match and those that do not match between two columns:

x <- c("apple, banana, pine nuts, almond")
y <- c("orange, apple, almond, grapes, carrots")
j <- data.frame(x,y)

To get:

yonly <- c("orange, grapes, carrots")
xonly <- c("banana, pine nuts")
both <- c("apple, almond")
k <- data.frame(cbind(x,y,both,yonly,xonly))

I looked into str_detect, intersect etc. but these would appear to require major surgery on the existing cells to separate them into different cells. This is a sizable data set with other columns so I'd prefer to not manipulate it too much. Can you help me come up with a simpler solution?

Thanks!

Upvotes: 1

Views: 267

Answers (2)

Andrew Gustar
Andrew Gustar

Reputation: 18425

To create the extra columns of a longer dataframe j as you described, you could use mapply with the approach used in Jilber Urbina's answer...

#set up data
x <- c("apple, banana, pine nuts, almond")
y <- c("orange, apple, almond, grapes, carrots")
j <- data.frame(x,y,stringsAsFactors = FALSE)

j[,c("yonly","xonly","both")] <- mapply(function(x,y) {
                    x2 <- unlist(strsplit(x, ",\\s*"))
                    y2 <- unlist(strsplit(y, ",\\s*"))
                    yonly <- paste(setdiff(y2, x2), collapse=", ")
                    xonly <- paste(setdiff(x2, y2), collapse=", ")
                    both <- paste(intersect(x2, y2), collapse=", ")
                    return(c(yonly, xonly, both))      },
                                        j$x,j$y)

j
                                 x                                      y                   yonly             xonly          both
1 apple, banana, pine nuts, almond orange, apple, almond, grapes, carrots orange, grapes, carrots banana, pine nuts apple, almond

Upvotes: 1

Jilber Urbina
Jilber Urbina

Reputation: 61154

You can use setdiff and intersect

> j <- data.frame(x,y, stringsAsFactors = FALSE)
> X <- strsplit(j$x, ",\\s*")[[1]]
> Y <- strsplit(j$y, ",\\s*")[[1]]
> 
> #Yonly
> setdiff(Y, X)
[1] "orange"  "grapes"  "carrots"
> 
> #Xonly
> setdiff(X, Y)
[1] "banana"    "pine nuts"
> 
> #Both
> intersect(X, Y)
[1] "apple"  "almond"

Upvotes: 4

Related Questions