Reputation: 97
Is there a way where I can find all the partial matches from df_2 to df_1?
partial match (if part of DF_1 string is in the whole string of DF_2) For example, part of "for solution" is in the whole string of "solution"
df_1=data.frame(
DF_1=c("suspension","tablet","for solution","capsule")
)
df_2=data.frame(
index=c("1","2","3","4","5"),
DF_2=c("for suspension", "suspension", "solution", "tablet,ER","tablet,IR")
)
df_out=data.frame(
DF_1=c("suspension","suspension","tablet","tablet","for solution"),
DF_2=c("for suspension", "suspension","tablet,ER","tablet,IR","solution"),
index=c("1","2","4","5","3")
)
Upvotes: 1
Views: 1062
Reputation: 101044
Here is a base R option using nested *apply
+ grepl
df_out <- within(
df_2,
DF_1 <- unlist(sapply(
DF_2,
function(x) {
Filter(
Negate(is.na),
lapply(
df_1$DF_1,
function(y) ifelse(grepl(y, x), y, ifelse(grepl(x, y), x, NA))
)
)
}
), use.names = FALSE)
)
such that
> df_out
index DF_2 DF_1
1 1 for suspension suspension
2 2 suspension suspension
3 3 solution solution
4 4 tablet,ER tablet
5 5 tablet,IR tablet
Upvotes: 0
Reputation: 994
Following @Akrun suggestion of using fuzzyjoin
According to your expected output, you want to join twice, and you want to perform inner_join
.
Finally you'll match twice if there is a perfect match, which is why you want to deduplicate (I did it with distinct
from dplyr
but you can do it with what you want.
df_out = distinct(
rbind(
regex_inner_join(df_1, df_2, by = c("DF_1"= "DF_2")),
regex_inner_join(df_2, df_1, by = c("DF_2"= "DF_1"))
)
)
df_out
The output is:
DF_1 index DF_2
1 suspension 2 suspension
2 for solution 3 solution
3 suspension 1 for suspension
4 tablet 4 tablet,ER
5 tablet 5 tablet,IR
You find your expected table, not in the same order though (lines & columns).
Upvotes: 1
Reputation: 886938
We can use fuzzyjoin
library(fuzzyjoin)
regex_left_join(df_2, df_1, by = c("DF_2"= "DF_1"))
Upvotes: 0
Reputation: 4067
This sounds like a job for grepl()
!
E.g. grepl(value, chars, fixed = TRUE)
Let me quote an example from a different answer:
> chars <- "test"
> value <- "es"
> grepl(value, chars)
[1] TRUE
> chars <- "test"
> value <- "et"
> grepl(value, chars)
[1] FALSE
Upvotes: 0