cowboy
cowboy

Reputation: 661

How do I match on partial field matches in two R data sets using dplyr

I have two data sets and I'm trying to create a data set based on a partial match of two fields from these two data sets.

I'm using dplyr filter and need to find partial partials in each row. I tried to use str_replace() but that doesn't seem to work on table$col.

Minimal reproducible example:

library(dplyr)

id <- c('1','2','3')
code<- c('a1231','b3211','c9871985')

tbl<- data.frame(id,code)

other_cd <- c('a123','b321','c987')
other_cd <- data.frame(other_cd)



match <- tbl %>% dplyr::filter(code %in% other_cd$other_cd) %>%
  dplyr::summarise(count = n_distinct(id))

Below is what I tried using str_replace()

fuzzy_match <- tbl %>% dplyr::filter(code %in% str_detect(other_cd$other_cd, "^[other_cd$other_cd]")) %>%
  dplyr::summarise(count = n_distinct(id))

I would like for fuzzy_match to contain 3 rows with the partially matched items, so the output may look something like:

id        code      other_cd
1         a1231      a123
2         b3211      b321
3         c9871985   c987  

Upvotes: 1

Views: 231

Answers (1)

akrun
akrun

Reputation: 886938

We could paste the elements of 'other_cd' separated by | for matching any of the elements

library(dplyr)
library(stringr)
tbl %>%
     filter(str_detect(code, str_c(other_cd$other_cd, collapse="|"))) %>%
      summarise(count = n_distinct(id))

Update

In the updated post, OP wants to create a new column from the other_cd. In that case, we can use str_extract

tbl %>% 
   mutate(other_cd = str_extract(code, str_c(other_cd$other_cd, collapse="|")))
#   id     code other_cd
#1  1    a1231     a123
#2  2    b3211     b321
#3  3 c9871985     c987

Or if the number of rows are the same

tbl %>% 
    filter(str_detect(code, as.character(other_cd$other_cd)))

Upvotes: 2

Related Questions