Reputation: 288

Maximize number of substring such that no substring has characters from other substring

So I was asked an interesting question recently related to strings and substring. Still trying to get the most optimal answer to this. I'll prefer answer in Java though any psuedo-code/language will be good as well.

The question is:

I am given a string S. I have to divide it into maximum number of substrings(not subsequence) such that no substring has character which is present in another substring.

Examples:

1.
   S = "aaaabbbcd"
   Substrings = ["aaaa","bbb","c","d"]

2.
   S = "ababcccdde"
   Substrings = ["abab","ccc","dd","e"]

3.
   S = "aaabbcccddda"
   Substrings = ["aaabbcccddda"]

Will be really glad if I can get a solution which is better than O(n^2)

Thanks for the help.

Upvotes: 1

Answers (3)

H.S.

Reputation: 12679

The accepted answer includes some unnecessary complexity in the implementation of algorithm. It is very straight forward to divide strings (as the examples posted by OP in question) into maximum number of substrings such that no substring has character which is present in another substring.

Algorithm:
(assumption: the input string is a not null string having 1 or more characters within 'a' to 'z' inclusive)

Record the last position of each character of input string.
Assume, the first substring end position is 0.
Iterate through string and for every character in input string-

a). If the current character last position is greater than substring end position than update substring end position to current character last position.
b). Add (or print) current character processing as part of current substring.
c). If substring end position is equal to the position of current character processing then it is end of a unique substring and from next character the new substring starts.
Repeat 3 until input string end.

Implementation:

#include <stdio.h>
#include <string.h>

void unique_substr(const char * pst) {
    size_t ch_last_pos[26] = {0};
    size_t subst_end_pos = 0;
    size_t len = strlen(pst);

    printf ("%s -> ", pst);
    for (size_t i = 0; i < len; i++) {
        ch_last_pos[pst[i] - 'a'] = i;
    }

    for (size_t i = 0; i < len; i++) {
        size_t pos = ch_last_pos[pst[i] - 'a'];
        if (pos > subst_end_pos) {
            subst_end_pos = pos;
        }

        printf ("%c", pst[i]);

        if (subst_end_pos == i) {
            printf (" ");
        }
    }
    printf ("\n");
}

//Driver program

int main(void) {

    //base cases
    unique_substr ("b");
    unique_substr ("ab");

    //strings posted by OP in question
    unique_substr ("aaaabbbcd");
    unique_substr ("ababcccdde");
    unique_substr ("aaabbcccddda");

    return 0;
}

Output:

# ./a.out
b -> b 
ab -> a b 
aaaabbbcd -> aaaa bbb c d 
ababcccdde -> abab ccc dd e 
aaabbcccddda -> aaabbcccddda

Upvotes: 0

RaffleBuffle

Reputation: 5455

You can do it with two passes. On the 1st you determine the max index of each character in the string. On the 2nd you keep track of the max index of each encountered character. If the max equals the current index you've reached the end of a unique substring.

Here's some Java code to illustrate:

char[] c = "aaaabbbcd".toCharArray();

int[] max = new int[26];        
for(int i=0; i<c.length; i++) max[c[i]-'a'] = i;;

for(int i=0, m=0, lm=0; i<c.length;)
  if((m = Math.max(m, max[c[i]-'a'])) == i++) 
    System.out.format("%s ", s.substring(lm, lm = i));

Output:

aaaa bbb c d

And for the other 2 strings:

abab ccc dd e 
aaabbcccddda

Upvotes: 0

Daniel

Reputation: 7714

It can be done in O(n) time.

The idea behind it is to predict where each substring will end. We know that if we read a char, then the last occurrence of this char must be in the same substring it is (otherwise there would be a repeated char in two distinct substrings).

Let's use abbacacd as example. Suppose we know the first and the last occurrences of every char in the string.

01234567
abbacacd   (reading a at index 0)

- we know that our substring must be at least abbaca (last occurrence of a);
- the end of our substring will be the maximum between the last occurrence of 
  all the chars inside the own substring;
- we iterate through the substring:

012345     (we found b at index 1)
abbaca      substring_end = maximum(5, last occurrence of b = 2)
            substring_end = 5.

012345     (we found b at index 2)
abbaca      substring_end = maximum(5, last occurrence of b = 2)
            substring_end = 5.

012345     (we found a at index 3)
abbaca      substring_end = maximum(5, last occurrence of a = 5)
            substring_end = 5.

012345     (we found c at index 4)
abbaca      substring_end = maximum(5, last occurrence of c = 6)
            substring_end = 6.

0123456    (we found a at index 5)
abbacac     substring_end = maximum(6, last occurrence of a = 5)
            substring_end = 6.

0123456    (we found c at index 6)
abbacac     substring_end = maximum(6, last occurrence of c = 6)
            substring_end = 6. 

---END OF FIRST SUBSTRING---

01234567
abbacacd           [reading d]

- the first and last occurrence of d is the same index.
- d is an atomic substring.

The O(n) solution is:

#include <bits/stdc++.h>

using namespace std;

int main(){
    int pos[26][2];
    int index;
    memset(pos, -1, sizeof(pos));
    string s = "aaabbcccddda";

    for(int i = 0; i < s.size(); i++){
        index = s[i] - 'a';
        if(pos[index][0] == -1) pos[index][0] = i;
        pos[index][1] = i;
    }

    int substr_end;
    for(int i = 0; i < s.size(); i++){
        index = s[i] - 'a';
        if(pos[index][0] == pos[index][1]) cout<<s[i]<<endl;
        else{
            substr_end = pos[index][1];
            for(int j = i + 1; j < substr_end; j++){
                substr_end = max(substr_end, pos[s[j] - 'a'][1]);
            }
            cout<<s.substr(i, substr_end - i + 1)<<endl;
            i = substr_end;
        }
    }
}

Upvotes: 2

Maximize number of substring such that no substring has characters from other substring

Answers (3)

Related Questions