Reputation: 275
I was thinking of the following problem: Given a string S, let the length of the ith substring be li and number of occurrences of the ith substring be oi. Print the substring such that li*oi is maximized.
I have O(n3) solution (brute force) for this problem where I am generating all the substrings and finding the substring with maximum value. My code for the same is as follows:
public static void solve(String S) {
long max = Integer.MIN_VALUE;
String res = "";
for (int i = 0; i < S.length(); i++) {
for (int j = 1; j <= S.length() - i; j++) {
String s = S.substring(i, i + j);
int o = countOccurrences(S, s);
long p = (long) o * (long) s.length();
if (max < p) {
max = p;
res = s;
}
}
}
System.out.println(res);
}
where countOccurrences() method takes O(n) time. I was wondering if there was a more efficient way to achieve this.
Upvotes: 5
Views: 836
Reputation: 6246
Improved algorithm will be as follows :- (Psuedo code)
int max = 0;
for(int i=0;i<str.length;) {
occurences = get_index(str[i]);
for(int j=i+1;j<str.length;j++) {
if(occurences.size()==0)
break;
for(int k=0;k<occurences.size();k++) {
int next_ind = occurrences[k]+j-i;
if(next_ind==length||str[next_ind]!=str[j]) {
delete(occurences[k]);
}
}
int score = ocurrences.size()*(j-i+1);
if(max<score)
score = max;
}
}
Time Complexity :-
Ideally the occurrences of the substring should decrease like n,n/2,n/3....
at max so hence the algorithm is n*n*(1+1/2+1/3...) = O(n^2*logn) as (1+1/2+1/3...1/n) => logn
. The occurence deletion can be done easily in O(1)
using linked list.
Upvotes: 0
Reputation: 3001
Presumably you can calculate the maximum available 'score' with what you have left, and thus avoid checking for substrings beyond a certain point. I'd guess it still scales badly, but it artificially reduces your 'n' value.
For example, if you have a string of length 10, and you manage to get a 3 character substring twice in the first 3 characters (ie 0-2, 3-5), then you can say max = 6. At that point, once you move beyond i=4, you can't score more than 6, so you can stop?
Upvotes: 0
Reputation: 51246
Here's a linear-time algorithm:
The key points are that
It might also be possible to do this using suffix arrays instead of suffix trees, in which case it's likely to require a constant factor less memory, but add a logarithmic factor to the running time.
Upvotes: 2
Reputation: 2105
An idea : You could shorten a bit the algorithm by creating an occurence list. This way before running countOccurences
you would check if it hasn't been already counted for that occurence and you wouldn't run countOccurences
again for that same occurence.
This is a first optimization, there should be other ones but it's a start.
Upvotes: 1