user1504088
user1504088

Reputation:

Is it possible to do a Levenshtein distance in Excel without having to resort to Macros?

Let me explain.

I have to do some fuzzy matching for a company, so ATM I use a levenshtein distance calculator, and then calculate the percentage of similarity between the two terms. If the terms are more than 80% similar, Fuzzymatch returns "TRUE".

My problem is that I'm on an internship, and leaving soon. The people who will continue doing this do not know how to use excel with macros, and want me to implement what I did as best I can.

So my question is : however inefficient the function may be, is there ANY way to make a standard function in Excel that will calculate what I did before, without resorting to macros ?

Thanks.

Upvotes: 10

Views: 30233

Answers (4)

Ewan Hume
Ewan Hume

Reputation: 1

Actually, I think I just found a workaround. I was adding it in the wrong part of the code...

Adding this line

  } else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
    val = row[j-1]-0.33;  //transposition

so it now reads

  if(b.charAt(i-1) == a.charAt(j-1)){
    val = row[j-1]; // match
  } else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
    val = row[j-1]-0.33;  //transposition
  } else {
    val = Math.min(row[j-1] + 1, // substitution
                   prev + 1,     // insertion
                   row[j] + 1);  // deletion 
  } 

Seems to fix the problem. Now 'biulding' is 92% accurate and 'bilding' is 88%. (whereas with the original formula 'biulding' was only 75%... despite being closer to the correct spelling of building)

Upvotes: 0

duality
duality

Reputation: 1223

If you came about this googling something like levenshtein distance google sheets

I threw this together, with the code comment from milot-midia on this gist (https://gist.github.com/andrei-m/982927 - code under MIT license)

  • From Sheets in the header menu, Tools -> Script Editor
  • Name the project
    • The name of the function (not the project) will let you use the func
  • Paste the following code

function Levenshtein(a, b) {
  if(a.length == 0) return b.length; 
  if(b.length == 0) return a.length;

  // swap to save some memory O(min(a,b)) instead of O(a)
  if(a.length > b.length) {
    var tmp = a;
    a = b;
    b = tmp;
  }

  var row = [];
  // init the row
  for(var i = 0; i <= a.length; i++){
    row[i] = i;
  }

  // fill in the rest
  for(var i = 1; i <= b.length; i++){
    var prev = i;
    for(var j = 1; j <= a.length; j++){
      var val;
      if(b.charAt(i-1) == a.charAt(j-1)){
        val = row[j-1]; // match
      } else {
        val = Math.min(row[j-1] + 1, // substitution
                       prev + 1,     // insertion
                       row[j] + 1);  // deletion
      }
      row[j - 1] = prev;
      prev = val;
    }
    row[a.length] = prev;
  }

  return row[a.length];
}

You should be able to run it from a spreadsheet with

=Levenshtein(cell_1,cell_2)

Upvotes: 17

richardtallent
richardtallent

Reputation: 35374

While it can't be done in a single formula for any reasonably-sized strings, you can use formulas alone to compute the Levenshtein Distance between strings using a worksheet.

Here is an example that can handle strings up to 15 characters, it could be easily expanded for more:

https://docs.google.com/spreadsheet/ccc?key=0AkZy12yffb5YdFNybkNJaE5hTG9VYkNpdW5ZOWowSFE&usp=sharing

This isn't practical for anything other than ad-hoc comparisons, but it does do a decent job of showing how the algorithm works.

Upvotes: 2

SeanC
SeanC

Reputation: 15923

looking at the previous answers to calculating Levenshtein distance, I think it would be impossible to create it as a formula.

Take a look at the code here

Upvotes: 0

Related Questions