RobertP.
RobertP.

Reputation: 285

R - calculate pairwise orthographic similarity of a list

I need to calculate orthographic similarity (edit/Levenshtein distance) among words in a given corpus.

The R package vwr seems to be able to calculate this:

coltheart.N(list1, list2)

in which the Levenshtein distance is computed pairwise between the matching words of the two word lists.

I was wondering if there was a way to calculate the Levenshtein distance between all possible word combinations of a given word list. Can somebody give me a hint?

Upvotes: 2

Views: 319

Answers (1)

LAP
LAP

Reputation: 6685

You can use the function levenshtein.distance from the package vwr and loop over every single word in the list:

library(vwr)

wordlist <- list("but", "nut", "rut")

output <- lapply(wordlist, function(x) levenshtein.distance(x, wordlist))

> output
[[1]]
but nut rut 
  0   1   1 

[[2]]
but nut rut 
  1   0   1 

[[3]]
but nut rut 
  1   1   0 

There is a warning message about passing a list argument to stringdist, but the results fit, so I'm pretty sure you can ignore it.


Edit:

To assign the words as names for the list items, just use

names(output) <- wordlist

> output
$but
but nut rut 
  0   1   1 

$nut
but nut rut 
  1   0   1 

$rut
but nut rut 
  1   1   0 

Upvotes: 3

Related Questions