YAKOVM
YAKOVM

Reputation: 10153

Algorithm to find pattern based string match

There are 2 input string r, s. The algorithm checks if there is a match between them: every char in r matches only one non - empty substring in s. And different chars in r match different substrings in s. For example if r = "ABA" and s = "hibiehi". there is a match A = "hi", B = "bie" But if r = "ABA" and s = "hibie" they don`t match.

Upvotes: 0

Views: 381

Answers (1)

kcsquared
kcsquared

Reputation: 5409

You can do this with depth-first search, also called backtracking. We can greedily try to match each letter of the pattern to substrings of the word, and backtrack upon failure. Keep track of the previous matches (and inverse matches) using a hashmap.

The running time of this approach is exponential (in the pattern length, at least), although I haven't analyzed it exactly. You can speed this up with early stopping/pruning the search tree, but it would take a new idea to get a fully polynomial runtime.

Python implementation:

def word_pattern(pattern: str, word: str) -> bool:
    """Given a pattern string, check if 'word' can match pattern.
    'Match' here means a bijection between each character in pattern
    with a nonempty substring in word.
    """

    pattern_len = len(pattern)
    word_len = len(word)

    def dfs(biject_pat_to_word: Dict[str, str],
            already_used_strs: Set[str],
            pattern_index: int,
            word_index: int) -> bool:
        """Greedily try to match each pattern character to the shortest
        possible substring of word, consistent with current bijection.
        Return whether this was possible."""

        if pattern_index == pattern_len:  # Reached pattern end
            return word_index == word_len
        next_letter = pattern[pattern_index]

        # If we've already seen this pattern char, it must match again
        if next_letter in biject_pat_to_word:
            pat_match = biject_pat_to_word[next_letter]
            if word[word_index:word_index + len(pat_match)] != pat_match:
                return False

            word_index += len(pat_match)
            pattern_index += 1

            return dfs(biject_pat_to_word, already_used_strs,
                       pattern_index, word_index)

        curr_str_match = ''
        for amount_to_take in range(1, word_len - word_index + 1):
            curr_str_match += word[word_index + amount_to_take - 1]
            if curr_str_match in already_used_strs:
                continue

            biject_pat_to_word[next_letter] = curr_str_match
            already_used_strs.add(curr_str_match)

            # Try to use this pattern
            if dfs(biject_pat_to_word, already_used_strs,
                   pattern_index=pattern_index + 1,
                   word_index=word_index + amount_to_take):
                return True

            already_used_strs.discard(curr_str_match)

        # If we've set which string we match, unset it for future calls
        if next_letter in biject_pat_to_word:
            del biject_pat_to_word[next_letter]

        return False

    return dfs(biject_pat_to_word={},
               already_used_strs=set(),
               pattern_index=0, word_index=0)

Upvotes: 1

Related Questions