Reputation: 288
So I was asked an interesting question recently related to strings and substring. Still trying to get the most optimal answer to this. I'll prefer answer in Java though any psuedo-code/language will be good as well.
The question is:
I am given a string S. I have to divide it into maximum number of substrings(not subsequence) such that no substring has character which is present in another substring.
Examples:
1.
S = "aaaabbbcd"
Substrings = ["aaaa","bbb","c","d"]
2.
S = "ababcccdde"
Substrings = ["abab","ccc","dd","e"]
3.
S = "aaabbcccddda"
Substrings = ["aaabbcccddda"]
Will be really glad if I can get a solution which is better than O(n^2)
Thanks for the help.
Upvotes: 1
Views: 580
Reputation: 12679
The accepted answer includes some unnecessary complexity in the implementation of algorithm. It is very straight forward to divide strings (as the examples posted by OP in question) into maximum number of substrings such that no substring has character which is present in another substring.
Algorithm:
(assumption: the input string is a not null string having 1
or more characters within 'a'
to 'z'
inclusive)
Iterate through string and for every character in input string-
a). If the current character last position is greater than substring end position than update substring end position to current character last position.
b). Add (or print) current character processing as part of current substring.
c). If substring end position is equal to the position of current character processing then it is end of a unique substring and from next character the new substring starts.
3
until input string end.Implementation:
#include <stdio.h>
#include <string.h>
void unique_substr(const char * pst) {
size_t ch_last_pos[26] = {0};
size_t subst_end_pos = 0;
size_t len = strlen(pst);
printf ("%s -> ", pst);
for (size_t i = 0; i < len; i++) {
ch_last_pos[pst[i] - 'a'] = i;
}
for (size_t i = 0; i < len; i++) {
size_t pos = ch_last_pos[pst[i] - 'a'];
if (pos > subst_end_pos) {
subst_end_pos = pos;
}
printf ("%c", pst[i]);
if (subst_end_pos == i) {
printf (" ");
}
}
printf ("\n");
}
//Driver program
int main(void) {
//base cases
unique_substr ("b");
unique_substr ("ab");
//strings posted by OP in question
unique_substr ("aaaabbbcd");
unique_substr ("ababcccdde");
unique_substr ("aaabbcccddda");
return 0;
}
Output:
# ./a.out
b -> b
ab -> a b
aaaabbbcd -> aaaa bbb c d
ababcccdde -> abab ccc dd e
aaabbcccddda -> aaabbcccddda
Upvotes: 0
Reputation: 5455
You can do it with two passes. On the 1st you determine the max index of each character in the string. On the 2nd you keep track of the max index of each encountered character. If the max equals the current index you've reached the end of a unique substring.
Here's some Java code to illustrate:
char[] c = "aaaabbbcd".toCharArray();
int[] max = new int[26];
for(int i=0; i<c.length; i++) max[c[i]-'a'] = i;;
for(int i=0, m=0, lm=0; i<c.length;)
if((m = Math.max(m, max[c[i]-'a'])) == i++)
System.out.format("%s ", s.substring(lm, lm = i));
Output:
aaaa bbb c d
And for the other 2 strings:
abab ccc dd e
aaabbcccddda
Upvotes: 0
Reputation: 7714
It can be done in O(n) time.
The idea behind it is to predict where each substring will end. We know that if we read a char, then the last occurrence of this char must be in the same substring it is (otherwise there would be a repeated char in two distinct substrings).
Let's use abbacacd
as example. Suppose we know the first and the last occurrences of every char in the string.
01234567
abbacacd (reading a at index 0)
- we know that our substring must be at least abbaca (last occurrence of a);
- the end of our substring will be the maximum between the last occurrence of
all the chars inside the own substring;
- we iterate through the substring:
012345 (we found b at index 1)
abbaca substring_end = maximum(5, last occurrence of b = 2)
substring_end = 5.
012345 (we found b at index 2)
abbaca substring_end = maximum(5, last occurrence of b = 2)
substring_end = 5.
012345 (we found a at index 3)
abbaca substring_end = maximum(5, last occurrence of a = 5)
substring_end = 5.
012345 (we found c at index 4)
abbaca substring_end = maximum(5, last occurrence of c = 6)
substring_end = 6.
0123456 (we found a at index 5)
abbacac substring_end = maximum(6, last occurrence of a = 5)
substring_end = 6.
0123456 (we found c at index 6)
abbacac substring_end = maximum(6, last occurrence of c = 6)
substring_end = 6.
---END OF FIRST SUBSTRING---
01234567
abbacacd [reading d]
- the first and last occurrence of d is the same index.
- d is an atomic substring.
The O(n) solution is:
#include <bits/stdc++.h>
using namespace std;
int main(){
int pos[26][2];
int index;
memset(pos, -1, sizeof(pos));
string s = "aaabbcccddda";
for(int i = 0; i < s.size(); i++){
index = s[i] - 'a';
if(pos[index][0] == -1) pos[index][0] = i;
pos[index][1] = i;
}
int substr_end;
for(int i = 0; i < s.size(); i++){
index = s[i] - 'a';
if(pos[index][0] == pos[index][1]) cout<<s[i]<<endl;
else{
substr_end = pos[index][1];
for(int j = i + 1; j < substr_end; j++){
substr_end = max(substr_end, pos[s[j] - 'a'][1]);
}
cout<<s.substr(i, substr_end - i + 1)<<endl;
i = substr_end;
}
}
}
Upvotes: 2