Reputation: 285
I need to calculate orthographic similarity (edit/Levenshtein distance) among words in a given corpus.
The R package vwr
seems to be able to calculate this:
coltheart.N(list1, list2)
in which the Levenshtein distance is computed pairwise between the matching words of the two word lists.
I was wondering if there was a way to calculate the Levenshtein distance between all possible word combinations of a given word list. Can somebody give me a hint?
Upvotes: 2
Views: 319
Reputation: 6685
You can use the function levenshtein.distance
from the package vwr
and loop over every single word in the list:
library(vwr)
wordlist <- list("but", "nut", "rut")
output <- lapply(wordlist, function(x) levenshtein.distance(x, wordlist))
> output
[[1]]
but nut rut
0 1 1
[[2]]
but nut rut
1 0 1
[[3]]
but nut rut
1 1 0
There is a warning message about passing a list argument to stringdist
, but the results fit, so I'm pretty sure you can ignore it.
Edit:
To assign the words as names for the list items, just use
names(output) <- wordlist
> output
$but
but nut rut
0 1 1
$nut
but nut rut
1 0 1
$rut
but nut rut
1 1 0
Upvotes: 3