Reputation: 71

Number of Distinct substring of a string

I was solving DISTINCT SUBSTRING (given a string, we need to find the total number of its distinct substrings).
I am using trie of suffixes to solve it.
I am passing the test cases, but getting TLE when I submit. Also, the space consumed is very large, at 4093M.

Note: Since there can be 256 char in total, I am setting an array size of 257, and the ascii value is acting as the index.

What I am thinking now:

for(int i=0;i<p;i++){
    string temp = str.substr(i,p-i);
    insert1(root,temp);
}

Since substr() may take O(n) time, in the worst case insert function also takes (n) time in the worst case, and O(n) for loop: O(n^3). This is getting me TLE.

error: could not convert 'temp' from 'std::__cxx11::string* {aka std::__cxx11::basic_string*}' to 'std::__cxx11::string {aka std::__cxx11::basic_string}'| ||=== Build failed: 2 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|

So I am thinking to replace substr() by something like this:

for(int i=0;i<p;i++){
     string *temp = &str[i];
     insert1(root,temp);
}  ///and it is giving me error please suggest here what is the mistake and what to do

So that I can save O(n) time.

Please tell me how can I modify my trie approach so it gets accepted.

#include<iostream>
#include<string>
using namespace std;

const int alphabetsize = 257;
int cnt=0;
struct trienode{
    struct trienode* children[alphabetsize];
    bool isendofword;
};

struct trienode *getnode(void){
    struct trienode *newnode = new trienode;
    newnode->isendofword = false;
    for(int i=0;i<alphabetsize;i++){
        newnode->children[i]=NULL;
    }
    return newnode;
}
void insert1(struct trienode* root,string &key){
    struct trienode *temp = root;
    for(int i=0;i<key.length();i++){
        int index = key[i];
        if(!temp->children[index]){
            temp->children[index]=getnode();
            cnt++;
        }
        temp = temp->children[index];
    }
    temp->isendofword=true;
}
int main(){
    int t;
    cin>>t;
    while(t--){
        cnt=0;
        string str;
        cin>>str;
        int p = str.length();
        struct trienode* root = getnode();
        for(int i=0;i<p;i++){
            string temp = str.substr(i,p-i);
            insert1(root,temp);
        }
        cout<<cnt<<endl;
    }

}

Upvotes: 7

Answers (2)

גלעד ברקן

Reputation: 23955

I'm not experienced in C++ but the error you commented about seems like it might stem from different expectations by the compiler for the type of variable it's receiving when encountering the variable temp.

As others have noted in SPOJ and in the comments, since the input length is only 256 characters, you could possibly get away with brute-force hashing and counting all of the substring occurences.

Another option is to examine the longest common prefixes in the suffix array for the string, both of which there are known construction algorithms for. If we iterate from the end of the suffix array, the difference between the current suffix length and the longest common prefix with its neighbour to the right tells us how many new distinct substrings are introduced.

For example:

01234
CCCCC

suffix array:
01234
43210
CCCCC
 CCCC
  CCC
   CC
    C

i: 4 -> 5 new substrings
i: 3 -> lcp(3,4) = len(3) no new substrings
i: 2 -> lcp(2,3) = len(2) no new substrings
i: 1 -> lcp(1,2) = len(1) no new substrings
i: 0 -> lcp(0,1) = len(0) no new substrings

total: 5 distinct substrings

Second example:

01234
ABABA

suffix array:
01234
42031
AAABB
 BBAA
 AA B
  B A
  A

i: 4 -> 4 new substrings
i: 3 -> lcp(3,4) = len(3) no new substrings
i: 2 -> 5 new substrings
i: 1 -> lcp(1,2) = len(1) no new substrings
i: 0 -> lcp(0,1) = len(0) no new substrings

total: 9 distinct substrings

Upvotes: 1

Bo R

Reputation: 2371

To save space, reuse the space of the strings by using std::string_view and store them in a std::unordered_set container. Should be enough to handle the problem of memory waste.

Upvotes: 0

Number of Distinct substring of a string

Answers (2)

Related Questions