Reputation: 393
I have two dataframes a
and b
and want to compare certain columns between eachother. Everything worked just fine, until an error came up :
Error in mutate_impl(.data, dots) :
Evaluation error: STRING_ELT() can only be applied to a 'character vector', not a 'integer'.
my code:
library(RecordLinkage)
library(dplyr)
lookup <- expand.grid(target = a$NAME, source = b$WHOLE_NAME, stringsAsFactors = FALSE)
y <-lookup %>% group_by(target) %>%
mutate(match_score = jarowinkler(target, source)) %>%
summarise(match = match_score[which.max(match_score)], matched_to =
source[which.max(match_score)]) %>%
inner_join(b, by = c("matched_to" = "WHOLE_NAME"))
Upvotes: 0
Views: 2101
Reputation: 8105
Without example data is it difficult to know for sure, but I can reproduce the error when the column with the names in b and/or a is a factor.
One solution is to use the stringdist
function from the package stringdist
:
a <- data.frame(names = c("foo", "bar", "aargh"), stringsAsFactors = FALSE)
b <- data.frame(wholename= c("foob", "baar", "flierp"), stringsAsFactors = FALSE)
lookup <- expand.grid(target = a$names, source = b$wholename, stringsAsFactors = FALSE)
y <-lookup %>% group_by(target) %>%
mutate(match_score = stringdist::stringdist(target, source, method = "jw")) %>%
summarise(match = match_score[which.max(match_score)], matched_to =
source[which.max(match_score)]) %>%
inner_join(b, by = c("matched_to" = "wholename"))
Another solution is to use the reclin
package (of which I am the author):
library(reclin)
names(b) <- "names"
pair_blocking(a, b) %>%
compare_pairs(by = c("names"), default_comparator = jaro_winkler()) %>%
select_n_to_m(weight = "names") %>%
link()
Upvotes: 2