jerry_k
jerry_k

Reputation: 393

STRING_ELT() can only be applied to a 'character vector', not a 'integer'

I have two dataframes a and b and want to compare certain columns between eachother. Everything worked just fine, until an error came up :

Error in mutate_impl(.data, dots) : 
Evaluation error: STRING_ELT() can only be applied to a 'character vector', not a 'integer'.

my code:

library(RecordLinkage)
library(dplyr)

lookup <- expand.grid(target = a$NAME, source = b$WHOLE_NAME, stringsAsFactors = FALSE)

y <-lookup %>% group_by(target) %>%
   mutate(match_score = jarowinkler(target, source))  %>%
   summarise(match = match_score[which.max(match_score)], matched_to = 
   source[which.max(match_score)])  %>%
   inner_join(b, by = c("matched_to" = "WHOLE_NAME"))

Upvotes: 0

Views: 2101

Answers (1)

Jan van der Laan
Jan van der Laan

Reputation: 8105

Without example data is it difficult to know for sure, but I can reproduce the error when the column with the names in b and/or a is a factor.

One solution is to use the stringdist function from the package stringdist:

a <- data.frame(names = c("foo", "bar", "aargh"), stringsAsFactors = FALSE)
b <- data.frame(wholename= c("foob", "baar", "flierp"), stringsAsFactors = FALSE)

lookup <- expand.grid(target = a$names, source = b$wholename, stringsAsFactors = FALSE)

y <-lookup %>% group_by(target) %>%
   mutate(match_score = stringdist::stringdist(target, source, method = "jw"))  %>%
   summarise(match = match_score[which.max(match_score)], matched_to = 
   source[which.max(match_score)])  %>%
   inner_join(b, by = c("matched_to" = "wholename"))

Another solution is to use the reclin package (of which I am the author):

library(reclin)

names(b) <- "names"

pair_blocking(a, b) %>% 
  compare_pairs(by = c("names"), default_comparator = jaro_winkler()) %>% 
  select_n_to_m(weight = "names") %>% 
  link()

Upvotes: 2

Related Questions