Reputation: 830
I have a quite big data.frame
with non updated names and I want to get the correct names that are stored in another data.frame
.
I am using stringdist
function to find the closest match between the two columns and then I want to put the new names in the original data.frame
.
I am using a code based on sapply
function, as in the following example :
dat1 <- data.frame("name" = paste0("abc", seq(1:5)),
"value" = round(rnorm(5), 1))
dat2 <- data.frame("name" = paste0("abd", seq(1:5)),
"other_info" = seq(11:15))
dat1$name2 <- sapply(dat1$name,
function(x){
char_min <- stringdist::stringdist(x, dat2$name)
dat2[which.min(char_min), "name"]
})
dat1
However, this code is too slow considering the size of my data.frame
.
Is there a more optimized alternative solution, using for example data.table
R package?
Upvotes: 2
Views: 2328
Reputation: 53
First convert the data frames into data tables:
dat1 <- data.table(dat1)
dat2 <- data.table(dat2)
Then use the ":=" and "amatch" command to create a new column that approximately matches the two names:
dat1[,name2 := dat2[stringdist::amatch(name, dat2$name)]$name]
This should be much faster than the sapply function. Hope this helps!
Upvotes: 1