219CID
219CID

Reputation: 440

Implementing pattern matching in a trie tree

I am currently using a trie implementation from this stack overflow post:

Getting a list of words from a Trie

to return a list of words which match a given prefix. I'm then using regex to filter out the words which don't meet the entire specified pattern.

EX: if the pattern I'm searching for is: CH??S? and this is a subset of the dictionary which matches my initial prefix: {CHABAD, CHACHA, CHARIOT, CHATTED, CHEATER, CHOMSKY, CHANNEL CHAFED, CHAFER, CHAINS, CHAIRS, CHEESE, CHEESY CHRONO, CHUTES, CHISEL}

I would search the trie with 'CH' prefix and then filter out words which match my desired pattern of CH??S? (CHEESY, CHEESE, CHISEL) and return those.

I am wondering if there is a faster way to do this to avoid using the regex in the final step. I thought I could use a suffix tree (Ukkonen's suffix tree algorithm in plain English )or the boyer-moore algorithm but neither work because they search on suffixes not on patterns.

Upvotes: 0

Views: 2256

Answers (1)

templatetypedef
templatetypedef

Reputation: 373482

Here's a nice recursive algorithm you can use that eliminates the need to use a final regex pass. It works by matching a pattern P against a tree T:

FindMatches(pattern P, tree T) {
    if (tree is empty) {
        return that there are no matches;
    }

    if (pattern is empty) {
        report a match if T is a word;
    } else if (pattern[0] is a letter) {
        FindMatches(P[1:], T.childFor(pattern[0]));
    } else if (pattern[0] is ?) {
        for (each child C of T) {
             gather matches from FindMatches(P[1:], C);
        }
        report all matches found this way;
    } else {
        something is wrong with your pattern;
    }
}

Your approach of just matching CH follows along with this approach. However, when you get to the point where you're matching ? characters, your approach just does a DFS to find all words below the current point, whereas this approach instead just branches on one level and gathers up matches. Stated differently, this approach is a hybrid between "follow one character at a time if there's a clear character to follow" and "explore the whole subtree looking for matches," focusing the branching strategically.

Upvotes: 1

Related Questions