Reputation: 173
The Damerau-Levenshtein distance between the two strings "abc" and "acb" would be 1, because it involves one transposition between "b" and "c".
> stringdist("abc", "acb", method = "dl")
[1] 1
Now suppose that I have the following two character vectors:
A = c("apple", "banana", "citrus")
B = c("apple", "citrus", "banana")
How can I calculate the Damerau-Levenshtein distance between A and B so that the result is identical to the distance between "abc" and "acb", since there is one transposition between "citrus" and "banana"? In other words, how can I calculate the Damerau-Levenshtein distance between A and B so that each item is counted as one character within a string?
Upvotes: 2
Views: 922
Reputation: 173
library(stringdist)
library(tidyr)
A = c("apple", "banana", "citrus")
B = c("apple", "citrus", "banana")
a <- factor(A, levels = union(A,B)) %>%
as.numeric() %>%
sapply(function(i) letters[i]
%>% paste0(collapse = "")
) %>%
paste0(collapse = "")
b <- factor(B, levels = union(A,B)) %>%
as.numeric() %>%
sapply(function(i) letters[i]
%>% paste0(collapse = "")
) %>%
paste0(collapse = "")
stringdist(a, b, method = "dl")
Upvotes: 2
Reputation: 8572
What about
vecdist <- function(x, y){
matches <- match(x, y, nomatch = 0)
nomatch <- matches == 0
# No match = we need 1 permutation
# Other matches: Compare index, for each "not inverted" index, (not 3 vs -3) we need 1 permutation
perm_match <- (matches - seq_along(matches))[!nomatch]
perm_n <- sum(perm_match != 0) - sum(duplicated(abs(perm_match)))
sum(nomatch) + perm_n + sum(!y %in% x)
}
The basic idea here is:
x
vs y
and visa-versa. Each of these are 1 permutationduplicated(abs(...))
. Eg abcd
, badc
is 2 permutations while abcd
, bdca
is 3.That is very similar to how stringdist
works for single strings.
A = c("apple", "banana", "citrus")
B = c("apple", "citrus", "banana")
vecdist(A, B)
[1] 1
A <- c(A, 'pear')
vecdist(A, B)
[1] 2
vecdist(B, A)
[1] 2
A <- c('apple', 'banana', 'citrus', 'pear')
B <- c('pear', 'citrus', 'banana', 'apple')
vecdist(A, B)
[1] 2
vecdist(B, A)
[1] 2
A <- c('apple', 'banana', 'citrus', 'pear')
B <- c('pear', 'citrus', 'apple', 'banana')
vecdist(A, B)
[1] 3
vecdist(B, A)
[1] 3
Upvotes: 0