Hmm
Hmm

Reputation: 103

map column of one data frame with another data frame column in R (map a child element which is created from parent)

I have two data frames, based on one data frame column I have created column value in another data frame. I have to map the newly created value in one data frame to the parent data frame column value from which new value got created. given example below

df1: parent data frame

id      sname       SID
23     S_FRGD_56    22
45     S_TYR_23     22
12     S_IUFY_82    15

From column 'sname' I have created new name and given Id for that new name as below

df2: 
newid      new_sname    SID
675     NewS_TYR_23      22
56      NewS_IUFY_82     15
124     NewS_FRGD_56     22

above two data frame, I have created 'NewS_FRGD_56' new_sname in df2 from 'S_FRGD_56' sname in df1. so I have map 'NewS_FRGD_56' belongs to 'S_FRGD_56'.

Expected output:
id    sname         newid     new_sname     SID
23     S_FRGD_56    124     NewS_FRGD_56    22
45     S_TYR_23     675     NewS_TYR_23     22
12     S_IUFY_82    56      NewS_IUFY_82    15

if we do join on SID there is change one sname will be wrongly mapped to some other because two snames can have same SID.

Upvotes: 0

Views: 44

Answers (1)

Ben
Ben

Reputation: 30474

One approach to this is to use the fuzzyjoin package. This will detect a substring of sname within new_sname using str_detect from stringr.

library(stringr)
library(fuzzyjoin)

fuzzy_inner_join(
  df2, 
  df1,
  by = c("new_sname" = "sname"),
  match_fun = str_detect
)

Output

  newid    new_sname SID.x id     sname SID.y
1   675  NewS_TYR_23    22 45  S_TYR_23    22
2    56 NewS_IUFY_82    15 12 S_IUFY_82    15
3   124 NewS_FRGD_56    22 23 S_FRGD_56    22

Upvotes: 2

Related Questions