Reputation: 103
I have two data frames, based on one data frame column I have created column value in another data frame. I have to map the newly created value in one data frame to the parent data frame column value from which new value got created. given example below
df1: parent data frame
id sname SID
23 S_FRGD_56 22
45 S_TYR_23 22
12 S_IUFY_82 15
From column 'sname' I have created new name and given Id for that new name as below
df2:
newid new_sname SID
675 NewS_TYR_23 22
56 NewS_IUFY_82 15
124 NewS_FRGD_56 22
above two data frame, I have created 'NewS_FRGD_56' new_sname in df2 from 'S_FRGD_56' sname in df1. so I have map 'NewS_FRGD_56' belongs to 'S_FRGD_56'.
Expected output:
id sname newid new_sname SID
23 S_FRGD_56 124 NewS_FRGD_56 22
45 S_TYR_23 675 NewS_TYR_23 22
12 S_IUFY_82 56 NewS_IUFY_82 15
if we do join on SID there is change one sname will be wrongly mapped to some other because two snames can have same SID.
Upvotes: 0
Views: 44
Reputation: 30474
One approach to this is to use the fuzzyjoin
package. This will detect a substring of sname
within new_sname
using str_detect
from stringr
.
library(stringr)
library(fuzzyjoin)
fuzzy_inner_join(
df2,
df1,
by = c("new_sname" = "sname"),
match_fun = str_detect
)
Output
newid new_sname SID.x id sname SID.y
1 675 NewS_TYR_23 22 45 S_TYR_23 22
2 56 NewS_IUFY_82 15 12 S_IUFY_82 15
3 124 NewS_FRGD_56 22 23 S_FRGD_56 22
Upvotes: 2