Vaclav Lokaj
Vaclav Lokaj

Reputation: 13

Levenshtein distance algorithm without delete operation

I modified Levenshtein distance algorithm form geeksforgeeks using full matrix. I deleted a delete operation (prevRow[j]) and it works now well only for specific order of input string.

cout << levenshteinFullMatrix("hellox", "xhello"); // 5 correct 
cout <<levenshteinFullMatrix("xhello", "hellox"); // 2 wrong

Can somebody show me how should i modified function below to works correct, and also explain me a reason why it is now sensitive on order of function arguments?

Many thanks

// C++ code for the above approach:
#include <bits/stdc++.h>
using namespace std;

int levenshteinFullMatrix(const string &str1, const string &str2) {
        int m = str1.length();
        int n = str2.length();

        // Initialize the DP table
        vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));
    
        // Fill the first row and column (only insertions allowed)
        for (int i = 0; i <= m; i++) {
            dp[i][0] = i; // Insertions to match str1 to an empty str2
        }
        for (int j = 0; j <= n; j++) {
            dp[0][j] = j; // Insertions to match an empty str1 to str2
        }
    
        // Compute the DP table
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (str1[i - 1] == str2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1]; // No cost for matching characters
                } else {
                    // Minimum cost between insertion or replacement
                    dp[i][j] = min(dp[i - 1][j - 1], dp[i][j - 1]) + 1;
                }
            }
        }

}

// Drivers code
int main()
{
    // Function Call
    cout << levenshteinFullMatrix("hellox", "xhello"); // 5 correct 
    cout << levenshteinFullMatrix("xhello", "hellox"); // 2 wrong
    return 0;
}

I try to edit indexes, search for solutions on google and also on StackOverflow.

Upvotes: 1

Views: 36

Answers (2)

n. m. could be an AI
n. m. could be an AI

Reputation: 120059

The two argument strings are symmetric. Deleting a character from the left argument and inserting a character to the right argument are the same thing. Deleting a character from the right argument and inserting a character to the left argument are the same thing. If you disallow both deleting-from-right and deleting-from-left, you at the same time disallow both insertions, and you can only compare strings of an equal length, and you don't need DP for that. Just compare the characters at the same position, and add up the number of unequal pairs.

If you only want disallow delete-from-left (which is the same as insert-to-right), you need to modify the first for loop:

      for (int i = 0; i <= m; i++) {
          dp[i][0] = n+m+1; // A large value means editing is impossible
                            // Insertions to match str1 to an empty str2 
                            // ARE NOT ALLOWED
      }

Upvotes: 1

m.raynal
m.raynal

Reputation: 3133

Let's first see what edit operations you allow and the cost you associate to them:
Case: matching characters:
dp[i][j] = dp[i - 1][j - 1]; -> cost is 0 (operation 1)
Case: mismatching characters: we take the min between
dp[i - 1][j - 1] -> mismatching characters, cost 1 to replace (operation 2)
dp[i][j - 1] -> append a character on the second string, cost 1 (operation 3)

We can already see that your edit operations are not symmetric, so it makes sense that the results may differ when we swap the strings.

Now let's see the example you provided:
cout << levenshteinFullMatrix("hellox", "xhello"); // 5 correct
"", "" -> initial case, total cost is 0
"h", "x" -> op2, cost 1, total cost is 1
"he", "xh" -> op2, cost 1, total cost is 2
"hel", "xhe" -> op2, cost 1, total cost is 3
"hell", "xhel" -> op1, cost 0, total cost is 3
"hello", "xhell" -> op2, cost 1, total cost is 4
"hellox", "xhello" -> op2, cost 1, total cost is 5

cout <<levenshteinFullMatrix("xhello", "hellox"); // 2 wrong
"x", "" -> initial case, total cost is 1
"xh", "h" -> op1, cost , total cost is 1
"xhel", "hel" -> op1, cost 0, total cost is 1
"xhell", "hell" -> op1, cost 0, total cost is 1
"xhello", "hello" -> op1, cost 0, total cost is 1
"xhello", "hellox" -> op3, cost 1, total cost is 2

Ultimately, the issue with your approach is that you chose non-symmetric operations. You allow to add a letter on the second string during all the DP process, but you allow to add a letter on the first string only during the initialization phase.
If the edit operations were symmetric, you would get the same result: 2 if you allow to add/remove a letter on any string for a cost of 1 during all the DP process ; or 5 if you only allow for replacements.

Upvotes: 1

Related Questions