Edit Distance - Proving that Greedy Choice is Optimal

Question

I was recently solving the following leet code problem "Edit Distance". I came up with the following code, which passes:

        def dfs(i, j):
            if j == len(word2) or i == len(word1):
                return (len(word1) - i) + (len(word2) - j)
            elif (i, j) in dp:
                return dp[(i, j)]
            
            res = float("inf")

            if word1[i] == word2[j]:
                res = min(res, dfs(i + 1, j + 1))
            
            res = min(res, 1 + dfs(i, j + 1), 1 + dfs(i + 1, j), 1 + dfs(i + 1, j + 1))
            
            dp[(i, j)] = res
            return dp[(i, j)]
        
        dp = {}
        return dfs(0, 0)

However, as I looked through solutions, I noticed that many users employed a slight optimization compared to my code; I adapted my code to represent this optimization.

        def dfs(i, j):
            if j == len(word2) or i == len(word1):
                return (len(word1) - i) + (len(word2) - j)
            elif (i, j) in dp:
                return dp[(i, j)]
            
            res = float("inf")

            if word1[i] == word2[j]:
                res = min(res, dfs(i + 1, j + 1))
            else:
                res = min(res, 1 + dfs(i, j + 1), 1 + dfs(i + 1, j), 1 + dfs(i + 1, j + 1))
            
            dp[(i, j)] = res
            return dp[(i, j)]
        
        dp = {}
        return dfs(0, 0)

The change is subtle and can be summarized in the following way. If we identify a match between two indices i and j, we can greedily choose to proceed with the subproblem represented by indices i + 1 and j + 1, completely ignoring the subproblems represented by replacing, deleting, or inserting at the previous index. In other words, if a given index pair has a match, we can guarantee that performing no operations at this point will yield a more optimal solution than replacing, deleting, or inserting at this index. However, I am struggling to prove that this statement is true. At a given subproblem, it is obvious that performing no operations is better than replacing, since both result in the same sub-problem (i + 1, j + 1). However, we must add one to the replace total to reflect replacing at index i; thus, it is obvious that no operations will always produce a more optimal result. However, for the cases of inserting or deleting an index i, proving no-operation optimality is not so simple. Namely, inserting or deleting lead to different sub-problems that do not blatantly imply that no operations will yield an optimal result. In the case of inserting, we add one to account for the insertion at index i and proceed with the sub-problem (i, j + 1). On the other hand, for deleting, we add one to acccount for the deletion at index i and proceed with the subproblem (i + 1, j). As such, I need to somehow prove that the result of subproblem 1 + (i, j + 1) and 1 + (i + 1, j) are both less than or equal to the result of subproblem (i + 1, j + 1). I do not know how to proceed. Any advice?

Edit Distance - Proving that Greedy Choice is Optimal

Answers (0)

Related Questions