Reputation: 209
I have been trying to understand this code for while but I can't get how the distances are calculated. I know how the algorithm should work (it provides the edit distance in characters between two strings and I can do it on paper), but I don't understand how this line (as an example)
d(i, j) = d(i - 1, j - 1)
comes back as an integer. Same for min1, min2. We already define
d(i,0)
and
d(0,j)
, but how does it get a value when
d(i,j)
is not 0 in one of the two arguments?
Code:
Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer
l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
Next
For j = 0 To l2
d(0, j) = j
Next
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
?Levenshtein("saturday","sunday")
3
Upvotes: 1
Views: 956
Reputation: 2569
I don't have much clue about Levenshtein, however I can explain what's going on in that array, please see comments in the code:
l1 = Len(s1) 'this sets a variable with the length of the first string
l2 = Len(s2) 'this sets a variable with the length of the second string
ReDim d(l1, l2) 'redimension the array to this size, though this can be seen as ReDim d(0 to l1, 0 to l2)
For i = 0 To l1 'for i = 0 to length of first string
d(i, 0) = i 'allocate the value of i to each row in the array, at col 0
Next i
For j = 0 To l2 'for j = 0 to length of second string
d(0, j) = j 'allocate the value of j to each column in the array, at row 0
Next j
EDIT: I've added some debugging (will print to ActiveSheet). Blue for where the letters match, red for the others... though the colour doesn't matter.
As far as I can tell, with each loop iteration, it builds a matrix of the differences, starting from comparing 1:1 first letters, till the last. Based on the loop position, and with each iteration, it will use previous position values to calculate the difference at current position.
This is probably easier to follow with shorter strings.
Loop through each letter in second word, then loop through each letter in first word
at first outer loop (for each row), first inner loop (for each column), we have S
meet S
= 0. (d(i, j) = d(i - 1, j - 1)
evaluates to d(i, j) = d(1 - 1, 1 - 1)
, i.e.: 0
)
at first outer loop, second inner loop, we have S
meet U
= 1.
etc, etc.
Other than that, step through the code and see how the variables change based on the if conditions... not sure i can explain this any better.
Public Function Levenshtein(s1 As String, s2 As String)
Dim i As Integer, j As Integer
Dim l1 As Integer, l2 As Integer
Dim min1 As Integer, min2 As Integer
Dim d() As Integer
'For debugging purposes only
Cells.Clear
Dim rngOutput As Range: Set rngOutput = ActiveSheet.Range("A1").Resize(Len(s1) + 2, Len(s2) + 2)
With rngOutput
.ColumnWidth = 3
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
End With
l1 = Len(s1): l2 = Len(s2): ReDim d(l1, l2)
For i = 0 To l1
d(i, 0) = i
With rngOutput
.Cells(i + 3, 1) = Mid(s1, i + 1, 1)
If Not i = 0 Then .Cells(i + 2, 2) = i
End With
Next i
For j = 0 To l2
d(0, j) = j
With rngOutput
.Cells(1, j + 3) = Mid(s2, j + 1, 1)
If Not j = 0 Then .Cells(2, j + 2) = j
End With
Next j
For i = 1 To l1
For j = 1 To l2
If Mid(s1, i, 1) = Mid(s2, j, 1) Then
d(i, j) = d(i - 1, j - 1)
With rngOutput.Cells(i + 2, j + 2)
.Value = d(i, j)
.Font.Color = vbBlue
End With
Else
min1 = d(i - 1, j) + 1
min2 = d(i, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
min2 = d(i - 1, j - 1) + 1
If min2 < min1 Then
min1 = min2
End If
d(i, j) = min1
With Cells(i + 2, j + 2)
.Value = d(i, j)
.Font.Color = vbRed
End With
End If
Next
Next
Levenshtein = d(l1, l2)
End Function
Upvotes: 1