Reputation: 661
I have two data sets and I'm trying to create a data set based on a partial match of two fields from these two data sets.
I'm using dplyr
filter and need to find partial partials in each row. I tried to use str_replace()
but that doesn't seem to work on table$col.
Minimal reproducible example:
library(dplyr)
id <- c('1','2','3')
code<- c('a1231','b3211','c9871985')
tbl<- data.frame(id,code)
other_cd <- c('a123','b321','c987')
other_cd <- data.frame(other_cd)
match <- tbl %>% dplyr::filter(code %in% other_cd$other_cd) %>%
dplyr::summarise(count = n_distinct(id))
Below is what I tried using str_replace()
fuzzy_match <- tbl %>% dplyr::filter(code %in% str_detect(other_cd$other_cd, "^[other_cd$other_cd]")) %>%
dplyr::summarise(count = n_distinct(id))
I would like for fuzzy_match
to contain 3 rows with the partially matched items, so the output may look something like:
id code other_cd
1 a1231 a123
2 b3211 b321
3 c9871985 c987
Upvotes: 1
Views: 231
Reputation: 886938
We could paste
the elements of 'other_cd' separated by |
for matching any of the elements
library(dplyr)
library(stringr)
tbl %>%
filter(str_detect(code, str_c(other_cd$other_cd, collapse="|"))) %>%
summarise(count = n_distinct(id))
In the updated post, OP wants to create a new column from the other_cd. In that case, we can use str_extract
tbl %>%
mutate(other_cd = str_extract(code, str_c(other_cd$other_cd, collapse="|")))
# id code other_cd
#1 1 a1231 a123
#2 2 b3211 b321
#3 3 c9871985 c987
Or if the number of rows are the same
tbl %>%
filter(str_detect(code, as.character(other_cd$other_cd)))
Upvotes: 2