Reputation:
Let me explain.
I have to do some fuzzy matching for a company, so ATM I use a levenshtein distance calculator, and then calculate the percentage of similarity between the two terms. If the terms are more than 80% similar, Fuzzymatch returns "TRUE".
My problem is that I'm on an internship, and leaving soon. The people who will continue doing this do not know how to use excel with macros, and want me to implement what I did as best I can.
So my question is : however inefficient the function may be, is there ANY way to make a standard function in Excel that will calculate what I did before, without resorting to macros ?
Thanks.
Upvotes: 10
Views: 30233
Reputation: 1
Actually, I think I just found a workaround. I was adding it in the wrong part of the code...
Adding this line
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
so it now reads
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
Seems to fix the problem. Now 'biulding' is 92% accurate and 'bilding' is 88%. (whereas with the original formula 'biulding' was only 75%... despite being closer to the correct spelling of building)
Upvotes: 0
Reputation: 1223
If you came about this googling something like
levenshtein distance google sheets
I threw this together, with the code comment from milot-midia on this gist (https://gist.github.com/andrei-m/982927 - code under MIT license)
Tools
-> Script Editor
function Levenshtein(a, b) {
if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
// swap to save some memory O(min(a,b)) instead of O(a)
if(a.length > b.length) {
var tmp = a;
a = b;
b = tmp;
}
var row = [];
// init the row
for(var i = 0; i <= a.length; i++){
row[i] = i;
}
// fill in the rest
for(var i = 1; i <= b.length; i++){
var prev = i;
for(var j = 1; j <= a.length; j++){
var val;
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
row[j - 1] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
You should be able to run it from a spreadsheet with
=Levenshtein(cell_1,cell_2)
Upvotes: 17
Reputation: 35374
While it can't be done in a single formula for any reasonably-sized strings, you can use formulas alone to compute the Levenshtein Distance between strings using a worksheet.
Here is an example that can handle strings up to 15 characters, it could be easily expanded for more:
https://docs.google.com/spreadsheet/ccc?key=0AkZy12yffb5YdFNybkNJaE5hTG9VYkNpdW5ZOWowSFE&usp=sharing
This isn't practical for anything other than ad-hoc comparisons, but it does do a decent job of showing how the algorithm works.
Upvotes: 2