Reputation: 3288
Given a string S
and a set of n
substrings. Remove every instance of those n
substrings from S
so that S
is of the minimum length and output this minimum length.
Example 1
S = ccdaabcdbb
n = 2
substrings = ab, cd
Output
2
Explanation:
ccdaabcdbb -> ccdacdbb -> cabb -> cb (length=2)
Example 2
S = abcd
n = 2
substrings = ab,bcd
Output
1
How do I solve this problem ?
Upvotes: 2
Views: 4232
Reputation: 60044
A simple Brute-force search algorithm is:
In Pseudocode:
def min_final_length (input, substrings):
best = len(input)
for substr in substrings:
beg = 0
// find all occurrences of substr in input and recurse
while (found = find_substring(input, substr, from=beg)):
input_without_substr = input[0:found]+input[found+len(substr):len(input)]
best = min(best, min_final_length(input_without_substr,substrings))
beg = found+1
return best
Let complexity be F(S,n,l)
where S
is the length of the input
string, n
is the cardinality of the set substrings
and l
is the "characteristic length" of substrings. Then
F(S,n,l) ~ n * ( S * l + F(S-l,n,l) )
Looks like it is at most O(S^2*n*l)
.
Upvotes: 3
Reputation: 71
I solved it using trie+dp.
First insert your substrings in a trie. Then define the state of the dp is some string, walk through that string and consider each i (for i =0 .. s.length()) as the start of some substring. let j=i and increment j as long as you have a suffix in the trie (which will definitely land you to at least one substring and may be more if you have common suffix between some substring, for example "abce" and "abdd"), whenever you encounter an end of some substring, go solve the new sub-problem and find the minimum between all substring reductions.
Here is my code for it. Don't worry about the length of the code. Just read the solve function and forget about the path, I included it to print the string formed.
struct node{
node* c[26];
bool str_end;
node(){
for(int i= 0;i<26;i++){
c[i]=NULL;
}
str_end= false;
}
};
class Trie{
public:
node* root;
Trie(){
root = new node();
}
~Trie(){
delete root;
}
};
class Solution{
public:
typedef pair<int,int>ii;
string get_str(string& s,map<string,ii>&path){
if(!path.count(s)){
return s;
}
int i= path[s].first;
int j= path[s].second;
string new_str =(s.substr(0,i)+s.substr(j+1));
return get_str(new_str,path);
}
int solve(string& s,Trie* &t, map<string,int>&dp,map<string,ii>&path){
if(dp.count(s)){
return dp[s];
}
int mn= (int)s.length();
for(int i =0;i<s.length();i++){
string left = s.substr(0,i);
node* cur = t->root->c[s[i]-97];
int j=i;
while(j<s.length()&&cur!=NULL){
if(cur->str_end){
string new_str =left+s.substr(j+1);
int ret= solve(new_str,t,dp,path);
if(ret<mn){
path[s]={i,j};
}
}
cur = cur->c[s[++j]-97];
}
}
return dp[s]=mn;
}
string removeSubstrings(vector<string>& substrs, string s){
map<string,ii>path;
map<string,int>dp;
Trie*t = new Trie();
for(int i =0;i<substrs.size();i++){
node* cur = t->root;
for(int j=0;j<substrs[i].length();j++){
if(cur->c[substrs[i][j]-97]==NULL){
cur->c[substrs[i][j]-97]= new node();
}
cur = cur->c[substrs[i][j]-97];
if(j==substrs[i].length()-1){
cur->str_end= true;
}
}
}
solve(s,t,dp,path);
return get_str(s, path);
}
};
int main(){
vector<string>substrs;
substrs.push_back("ab");
substrs.push_back("cd");
Solution s;
cout << s.removeSubstrings(substrs,"ccdaabcdbb")<<endl;
return 0;
}
Upvotes: 0
Reputation: 63757
The following solution would have an complexity of O(m * n) where m = len(S) and n is the number of substring
def foo(S, sub):
i = 0
while i < len(S):
for e in sub:
if S[i:].startswith(e):
S = S[:i] + S[i+len(e):]
i -= 1
break
else: i += 1
return S, i
Upvotes: 1
Reputation: 2881
If you are for raw performance and your string is very large, you can do better than brute force. Use a suffix trie (E.g, Ukkonnen trie) to store your string. Then find each substring (which us done in O(m) time, m being substring length), and store the offsets to the substrings and length in an array. Then use the offsets and length info to actually remove the substrings by filling these areas with \0 (in C) or another placeholder character. By counting all non-Null characters you will get the minimal length of the string.
This will als handle overlapping substring, e.g. say your string is "abcd", and you have two substrings "ab" and "abcd".
Upvotes: 0