Reputation: 157
I have two tables/dataframes.
The first Table (ID) one looks like this:
The second table (Names) looks like this:
I want to match the "IDTag" variable to the first few letters of the "Name" variable. In other programming languages I would do a foreach and run through each of the IDTags for each of the rows of the second table (matching the IDTag to the first n characters of the "Name" variable where n is the number of characters of the IDTag in question.
In R it seems like there should be a method for doing this and I have looked at pmatch and a few others but those either don't appear to make the match at all or when I try to use them come up with several NAs in places where I wouldn't have expected them (Sample code using the table data above:
NameMatches <- Names[pmatch(
ID$IDTag,
Names$Name,
duplicates.ok = TRUE
),]
I have the feeling I am going about this with the wrong theory or concept so I am looking to see if someone can guide on the simplest/clearest way to do this accurately.
Editing original question to reply to comments...
The expected output would look something like this (i.e. - all of the columns of the Names table with the addition of the Group column from the ID table. Multiple matches are expected - one to many relationship between ID and Names tables):
Thanks,
Upvotes: 2
Views: 1031
Reputation: 522762
If you are open to using the sqldf
package, then one option would be to just write a join using the logic you gave us:
library(sqldf)
sql <- "SELECT * FROM ID t1 INNER JOIN Names t2
ON t2.Name LIKE t1.IDTag || '%'"
output <- sqldf(sql)
Note: If you want to keep all rows from the ID
data frame, regardless of whether or not they match to anything in Names
, then use a left join instead.
Upvotes: 2