Reputation: 106
Consider A,B,C,D .... as words. I have two DFs.
ColA
A B
B C
C D
E F
G H
A M
M
ColB
A B C D X Y Z
C D M N F K L
S H A F R M T U
Operation: I want to search all element of df1 in df2 then append all the matching values in a new column OR may be create multiple rows.
ColB COlB
A B C D X Y Z A,A B,B C,C D
C D M N F K L C D,M
S H A F R M T U A,A M
ColB Output
A B C D X Y Z A
A B C D X Y Z A B
A B C D X Y Z B C
A B C D X Y Z C D
C D M N F K L C D
C D M N F K L M
S H A F R M T U A
S H A F R M T U A M
Upvotes: 0
Views: 62
Reputation: 14902
I think this will do it, although it differs a bit from your expected answer, which I think is wrong.
First set up the input data frames:
# set up the data
df1 <- data.frame(ColA = c("A B",
"B C",
"C D",
"E F",
"G H",
"A M",
"M"),
stringsAsFactors = FALSE)
df2 <- data.frame(ColB = c("A B C D X Y Z",
"C D M N F K L",
"S H A F R M T"),
stringsAsFactors = FALSE)
Next we will form all the pairwise combinations of the things to search with the things to be searched:
# create a vector of patterns and items to search
intermediate <- as.vector(outer(df2$ColB, df1$ColA, paste, sep = "|"))
# split it into a list
intermediate <- strsplit(intermediate, "|", fixed = TRUE)
Then we can create a function to match the elements for each row of this full combination dataset The core is the foundMatch
which returns a logical indicating whether all elements in ColA
were present in ColB
. In your examples, order does not matter, so here we split the elements and look for all of the first to be in the second.
# set up the output data.frame
Output2 <- data.frame(do.call(rbind, intermediate))
names(Output2) <- c("ColB", "Output")
# here is the core, which does the element matching
foundMatch <- apply(Output2, 1, function(x) {
tokens <- strsplit(x, " ", fixed = TRUE)
all(tokens[[2]] %in% tokens[[1]])
})
# filter out the ones with the match
Output2 <- Output2[foundMatch, ]
Output2
## ColB Output
## 1 A B C D X Y Z A B
## 2 C D M N F K L A B
## 3 S H A F R M T A B
## 10 A B C D X Y Z E F
## 14 C D M N F K L G H
## 20 C D M N F K L M
## 21 S H A F R M T M
Not exactly what you have above but I think it's correct.
Upvotes: 1
Reputation: 17648
It is not obvious for me how your data.frames df1
and df2
are built. But you can try to vectorise your data and match both sets.
d1 <- sort(as.character(unlist(df1)))
d2 <- sort(as.character(unlist(df2)))
# get the intersection/difference without duplicates
intersect(d1,d2)
setdiff(d1,d2)
# get all values matching with the first or with the second dataset, respectively
d1[ d1 %in% d2 ]
d2[ d2 %in% d1 ]
Upvotes: 0