Reputation: 693
I have a data frame of the following form:
Column1 = c('Elephant,Starship Enterprise,Cat','Random word','Word','Some more words, Even more words')
Column2=c('Rat,Starship Enterprise,Elephant','Ocean','No','more')
d1 = data.frame(Column1,Column2)
What I want to do is to look for and count the exact match of words in column 1 and column 2. Each column can have multiple words separated by a comma.
For example in row 1, we see there are two common words a) Starship Enterprise and b) Elephant. However, in row 4, even though the word "more" appears in both columns, the exact string (Some more words and Even more words) do not appear. The expected output would be something like this.
Any help will be appreciated.
Upvotes: 2
Views: 560
Reputation: 886938
We can do this with cSplit
library(splitstackshape)
library(data.table)
v1 <- cSplit(setDT(d1, keep.rownames = TRUE), 2:3, ",", "long")[,
length(intersect(na.omit(Column1), na.omit(Column2))), rn]$V1
d1[, Common := v1][, rn := NULL][]
# Column1 Column2 Common
#1: Elephant,Starship Enterprise,Cat Rat,Starship Enterprise,Elephant 2
#2: Random word Ocean 0
#3: Word No 0
#4: Some more words, Even more words more 0
Upvotes: 1
Reputation: 388807
Split columns on comma and count the intersection of words
mapply(function(x, y) length(intersect(x, y)),
strsplit(d1$Column1, ","), strsplit(d1$Column2, ","))
#[1] 2 0 0 0
Or a tidyverse
way
library(tidyverse)
d1 %>%
mutate(Common = map2_dbl(Column1, Column2, ~
length(intersect(str_split(.x, ",")[[1]], str_split(.y, ",")[[1]]))))
# Column1 Column2 Common
#1 Elephant,Starship Enterprise,Cat Rat,Starship Enterprise,Elephant 2
#2 Random word Ocean 0
#3 Word No 0
#4 Some more words, Even more words more 0
Upvotes: 2