Reputation: 15
Some of the data I have been working on is re-ID'd several times.
To work on them efficiently, I need to merge df1
and df2
according to their id
s.
I have tried several approaches based on separate()
, grep()
, fuzzy_join()
but because of id2
of df2
contains longer ids than df1
I couldn't manage to deal with this issue.
Representative df1
and df2
below;
View(df1)
id1 value1
N12800 19562
N11901 403
N14688 100
N12886B 32
T00014 14
T16487 13
View(df2)
id2 value2
N11959_N11901 56
T03938_N16439_T05162_T05141_N14997 654
N12800 1234
N12886B_N12886A 75
N14688 14
T18332_T16487_T13537_T11268_T09399 61
Can you suggest a solution for this "partial" ID matching problem
Upvotes: 0
Views: 53
Reputation: 1441
If you tried separate()
, you are already familiar with tidyr
. Does lengthening df2
give you what you need to perform the join?
unnest(
mutate(
test,
id2 = strsplit(id2, split = "_")
),
id2
)
Upvotes: 1