pattern match dynamic programming for advice

Question

Working on below pattern match problem. And post detailed problem statement and code. The code is working. And in the below implementation, it is looped for pattern in outer loop, then internal loop for source string to match -- in order to build the two dimensional DP table.

My question is, if I change the implementation, which outer loop is for the source string to match, and internal loop is for the pattern. Will there be any performance gain, or any functional defects? Any advice about which flavor is better, or almost the same is appreciated.

More specifically, I mean change loop from below (using similar logics for the content of the loop),

    for i in range(1, len(p) + 1):
        for j in range(1, len(s) + 1):

to,

    for i in range(1, len(s) + 1):
        for j in range(1, len(p) + 1):

Problem Statement

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

class Solution(object):

    def isMatch(self, s, p):
        # The DP table and the string s and p use the same indexes i and j, but
        # table[i][j] means the match status between p[:i] and s[:j], i.e.
        # table[0][0] means the match status of two empty strings, and
        # table[1][1] means the match status of p[0] and s[0]. Therefore, when
        # refering to the i-th and the j-th characters of p and s for updating
        # table[i][j], we use p[i - 1] and s[j - 1].

        # Initialize the table with False. The first row is satisfied.
        table = [[False] * (len(s) + 1) for _ in range(len(p) + 1)]

        # Update the corner case of matching two empty strings.
        table[0][0] = True

        # Update the corner case of when s is an empty string but p is not.
        # Since each '*' can eliminate the charter before it, the table is
        # vertically updated by the one before previous. [test_symbol_0]
        for i in range(2, len(p) + 1):
            table[i][0] = table[i - 2][0] and p[i - 1] == '*'

        for i in range(1, len(p) + 1):
            for j in range(1, len(s) + 1):
                if p[i - 1] != "*":
                    # Update the table by referring the diagonal element.
                    table[i][j] = table[i - 1][j - 1] and \
                                  (p[i - 1] == s[j - 1] or p[i - 1] == '.')
                else:
                    # Eliminations (referring to the vertical element)
                    # Either refer to the one before previous or the previous.
                    # I.e. * eliminate the previous or count the previous.
                    # [test_symbol_1]
                    table[i][j] = table[i - 2][j] or table[i - 1][j]

                    # Propagations (referring to the horizontal element)
                    # If p's previous one is equal to the current s, with
                    # helps of *, the status can be propagated from the left.
                    # [test_symbol_2]
                    if p[i - 2] == s[j - 1] or p[i - 2] == '.':
                        table[i][j] |= table[i][j - 1]

        return table[-1][-1]

thanks in advance, Lin

dejvuth · Accepted Answer

If you swap the loop, i would be the index for s, and j would be the index for p. You'll need to swap i and j everywhere in the loop.

    for i in range(1, len(s) + 1):
        for j in range(1, len(p) + 1):
            if p[j - 1] != "*":
                # Update the table by referring the diagonal element.
                table[j][i] = table[j - 1][i - 1] and \
                              (p[j - 1] == s[i - 1] or p[j - 1] == '.')
            else:
                # Eliminations (referring to the vertical element)
                # Either refer to the one before previous or the previous.
                # I.e. * eliminate the previous or count the previous.
                # [test_symbol_1]
                table[j][i] = table[j - 2][i] or table[j - 1][i]

                # Propagations (referring to the horizontal element)
                # If p's previous one is equal to the current s, with
                # helps of *, the status can be propagated from the left.
                # [test_symbol_2]
                if p[j - 2] == s[i - 1] or p[j - 2] == '.':
                    table[j][i] |= table[j][i - 1]

The original algorithm fills table row-by-row (first row 1, then 2, 3, ...). After the swap, the table will be filled column-by-column (first column 1, then 2, 3, ...).

The idea of the algorithm remains unchanged, since each element in table is defined via elements on the previous column or rows---the elements you already calculated whether you do it row-by-row or column-by-column.

In details, table[j][i] is defined via the diagonal element of the previous column table[j-1][i-1]; or the elements in previous rows and/or column table[j-2][i], table[j-1][i], and/or table[j][i-1].

Therefore, the performance is the same after the swap. In both versions, each calculation of a table element requires a constant time. The total time for constructing table is O(len(s) * len(p)).

The functionality is also the same after the swap. Basically, if the original one is correct, then the modified version is also correct. Whether or not the original one is correct is another story though ...

Let's look at the original version. At first glance, it seems to have indexing problems in two places when i = 1: table[i - 2][j] and p[i - 2].

However, Python interprets the index -1 as the last element. So, table[-1][j] refers to the last row of table, in which all elements are False. So, table[1][j] = table[-1][j] or table[0][j] is equivalent to table[1][j] = table[0][j].

For p[-1], notice that you only access it in the if-statement, when p[0] = * (which doesn't make sense for matching). It doesn't matter what the value of p[-1] is, because it won't effect the value of table[i][j]. To see this: If the result of the if-statement happens to be True, we know that table[1][0] is initially False, so table[1][1], table[1][2], ... must also be False. In other words, p[0] = * won't match any strings.

pattern match dynamic programming for advice

Answers (1)

Related Questions