R Merge 2 tables/dataframes by partial match

Question

I have two tables/dataframes.
The first Table (ID) one looks like this:

The second table (Names) looks like this:

I want to match the "IDTag" variable to the first few letters of the "Name" variable. In other programming languages I would do a foreach and run through each of the IDTags for each of the rows of the second table (matching the IDTag to the first n characters of the "Name" variable where n is the number of characters of the IDTag in question.

In R it seems like there should be a method for doing this and I have looked at pmatch and a few others but those either don't appear to make the match at all or when I try to use them come up with several NAs in places where I wouldn't have expected them (Sample code using the table data above:

NameMatches <- Names[pmatch(
                  ID$IDTag,
                  Names$Name,
                  duplicates.ok = TRUE
                ),]

I have the feeling I am going about this with the wrong theory or concept so I am looking to see if someone can guide on the simplest/clearest way to do this accurately.

Editing original question to reply to comments...

The expected output would look something like this (i.e. - all of the columns of the Names table with the addition of the Group column from the ID table. Multiple matches are expected - one to many relationship between ID and Names tables):

Thanks,

Tim Biegeleisen · Accepted Answer

If you are open to using the sqldf package, then one option would be to just write a join using the logic you gave us:

library(sqldf)
sql <- "SELECT * FROM ID t1 INNER JOIN Names t2
        ON t2.Name LIKE t1.IDTag || '%'"

output <- sqldf(sql)

Note: If you want to keep all rows from the ID data frame, regardless of whether or not they match to anything in Names, then use a left join instead.

R Merge 2 tables/dataframes by partial match

Answers (1)

Related Questions