Reputation: 71
I was solving DISTINCT SUBSTRING (given a string, we need to find the total number of its distinct substrings).
I am using trie of suffixes to solve it.
I am passing the test cases, but getting TLE
when I submit. Also, the space consumed is very large, at 4093M
.
Note: Since there can be 256 char in total, I am setting an array size of 257, and the ascii value is acting as the index.
What I am thinking now:
for(int i=0;i<p;i++){
string temp = str.substr(i,p-i);
insert1(root,temp);
}
Since substr()
may take O(n) time, in the worst case insert function also takes
(n) time in the worst case, and O(n) for loop: O(n^3). This is getting me TLE.
error: could not convert 'temp' from 'std::__cxx11::string* {aka std::__cxx11::basic_string*}' to 'std::__cxx11::string {aka std::__cxx11::basic_string}'| ||=== Build failed: 2 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|
So I am thinking to replace substr()
by something like this:
for(int i=0;i<p;i++){
string *temp = &str[i];
insert1(root,temp);
} ///and it is giving me error please suggest here what is the mistake and what to do
So that I can save O(n) time.
Please tell me how can I modify my trie approach so it gets accepted.
#include<iostream>
#include<string>
using namespace std;
const int alphabetsize = 257;
int cnt=0;
struct trienode{
struct trienode* children[alphabetsize];
bool isendofword;
};
struct trienode *getnode(void){
struct trienode *newnode = new trienode;
newnode->isendofword = false;
for(int i=0;i<alphabetsize;i++){
newnode->children[i]=NULL;
}
return newnode;
}
void insert1(struct trienode* root,string &key){
struct trienode *temp = root;
for(int i=0;i<key.length();i++){
int index = key[i];
if(!temp->children[index]){
temp->children[index]=getnode();
cnt++;
}
temp = temp->children[index];
}
temp->isendofword=true;
}
int main(){
int t;
cin>>t;
while(t--){
cnt=0;
string str;
cin>>str;
int p = str.length();
struct trienode* root = getnode();
for(int i=0;i<p;i++){
string temp = str.substr(i,p-i);
insert1(root,temp);
}
cout<<cnt<<endl;
}
}
Upvotes: 7
Views: 857
Reputation: 23955
I'm not experienced in C++ but the error you commented about seems like it might stem from different expectations by the compiler for the type of variable it's receiving when encountering the variable temp
.
As others have noted in SPOJ and in the comments, since the input length is only 256 characters, you could possibly get away with brute-force hashing and counting all of the substring occurences.
Another option is to examine the longest common prefixes in the suffix array for the string, both of which there are known construction algorithms for. If we iterate from the end of the suffix array, the difference between the current suffix length and the longest common prefix with its neighbour to the right tells us how many new distinct substrings are introduced.
For example:
01234
CCCCC
suffix array:
01234
43210
CCCCC
CCCC
CCC
CC
C
i: 4 -> 5 new substrings
i: 3 -> lcp(3,4) = len(3) no new substrings
i: 2 -> lcp(2,3) = len(2) no new substrings
i: 1 -> lcp(1,2) = len(1) no new substrings
i: 0 -> lcp(0,1) = len(0) no new substrings
total: 5 distinct substrings
Second example:
01234
ABABA
suffix array:
01234
42031
AAABB
BBAA
AA B
B A
A
i: 4 -> 4 new substrings
i: 3 -> lcp(3,4) = len(3) no new substrings
i: 2 -> 5 new substrings
i: 1 -> lcp(1,2) = len(1) no new substrings
i: 0 -> lcp(0,1) = len(0) no new substrings
total: 9 distinct substrings
Upvotes: 1
Reputation: 2371
To save space, reuse the space of the strings by using std::string_view and store them in a std::unordered_set
container. Should be enough to handle the problem of memory waste.
Upvotes: 0