Avinash
Avinash

Reputation: 13257

generate all possible unique substring of a string

I need to get all the unique substring of a string. I have stored the string into a trie but I am not able to figure out how can i used the same to print all the unique substring for example

string aab all unique substrings are {"a", "aa", "aab", "ab", "b"}

here is my code for trie

#include <iostream>
#include <map>
#include <string>
#include <stack>

struct trie_node_t {
    typedef std::map<char, trie_node_t *> child_node_t;
    child_node_t m_childMap;
    trie_node_t() :m_childMap(std::map<char, trie_node_t*>()) {}

    void insert( std::string& word ) {
        trie_node_t *pNode = this;
        for ( std::string::const_iterator itr = word.begin(); itr != word.end(); ++itr) {
            char letter = *itr;
            if ( pNode->m_childMap.find(letter) == pNode->m_childMap.end()){
                pNode->m_childMap[letter] = new trie_node_t();
            }
            pNode = pNode->m_childMap[letter];
        }
    }

    void print() {
    }
};

int main ( int argc, char **argv ) {
    trie_node_t trie;
    trie.insert(std::string("aab"));
    trie.print();
}

How do i implement print function which will print all the unique substring.

I am looking for Linear time approach

Since I have built a trie, is there a any way I can iterate over and whenever I visit any node I can print it as a unique string.

Upvotes: 3

Views: 5255

Answers (3)

Rafał Dowgird
Rafał Dowgird

Reputation: 45101

First, build a suffix tree. This represents all suffixes of the string and can be done in linear time. Since every substring is a prefix of a suffix, now you need to enumerate the prefixes of the suffixes.

Fortunately if two suffixes share a common prefix, the prefix will be on a single common path from root, so there's a 1-1 mapping between paths from root(*) in the tree and unique suffixes.

Therefore it is sufficient to iterate over all paths from root in the suffix tree to produce all the unique substrings.

(*) The paths in the suffix tree are compressed, i.e. an edge might represent several characters. You need to uncompress the paths to produce all the substrings, i.e. treat compressed edges as multi-node paths.

Upvotes: 6

iloahz
iloahz

Reputation: 4561

There are "end signs" in Trie, i.e. if a node is the last char of a string, then it's marked as one terminal.

So if you need to print all the strings in a Trie, you'll need to dfs() on that Trie, whenever visiting a node with end sign(which means it's a terminal), you know it's the last char of some string, so print it.

Upvotes: 0

Note that every substring of myString has a length between 0 and strlen(myString). So just loop over every possible length, and every possible starting-position of the substring.

Upvotes: 0

Related Questions