Reputation: 43
How to count matching characters in subsequent substrings in O(n). Substrings are formed by removing one character at a time from the beginning.
For example: String given is ababcabab
, the result expected is 8
Substr1: babcabab
Count:0
Substr2: abcabab
Count:2 as 1st two characters matches with given original string, 3rd character doesnt match, so checking for match is stopped
Substr3: bcabab
Count:0
SubStr4: cabab
Count:0
SubStr5: abab
Count:4
SubStr6: bab
Count:0
Substr7: ab
Count:2
SubStr8: b
Count:0
Result expected: 2+4+2=8
Upvotes: 2
Views: 435
Reputation: 23955
We can solve this in O(n) by drawing some logical conclusions: Since all matches are the same; that is, they are matching the string itself; any match that starts on index i
of the string will contain all matches that start before i
(or a portion as far as length permits). Additionally, any match where its length is greater than its starting index will consist of repetitions of the section of the start of the string to the start of the match. We need only record in full the matches we can find in one traversal of the string without stepping back, and deduce the rest.
Examples (non zero based):
"aaaaaa":
Starting on index 2, we have a match length 5. This match necessarily includes
a match of length 4 starting on index 3 (since index 3 is index 2 for the
substring that starts on index 2). Continuing the same logic, we add 3 + 2 + 1
for a total of 15, without needing to scan and compare more than Substr2.
"aabaabaa":
Starting on index 2, we have a match length 1.
Starting on index 4, we have a match length 5. This match necessarily includes
a match of length 1 starting on index 5 (since index 5 is index 2 for the
substring that starts on index 4). It also necessarily includes a match of
length (5 - 3) starting on index 7 (since index 7 is index 4 for the substring
that starts on index 4), and this match implies another match of length 1,
starting on index 8. Altogether 1 + 5 + 1 + (5 - 3) + 1 = 10. Again, the scan
was O(n).
"aabaabaabaabaa":
Starting on index 2, we have a match length 1.
Starting on index 4, we have a match length 11.
1 + 11 + 1 + (11 - 3) + 1 + (8 - 3) + 1 + (5 - 3) + 1 = 31.
"aabaaab":
Starting on index 2, we have a match length 1.
For repeated patterns in the beginning of the string, we can use a formula
rather than multiple scans, so a string like "aabaaaaaaaaaab" would have the
same complexity as the one above, (number of times the pattern repeats - number
of times the pattern repeats in the beginning of the string) * total length of
repeated pattern at the start of the string. We identify a pattern if the
length of the first match is a multiple of its starting index. Identifying
this pattern and using the formula also prevents erroneously missing the
correct match to record (e.g., without it, we would have identified 'aa' and
'a' at the end as matches and missed the 'aab').
So starting on index 4, we have (3 - 2) * 2 = 2
Starting on index 5, we have a match length 3.
1 + 2 + 3 + 1 = 7
"ababcabab":
Starting on index 3, we have a match length 2.
Starting on index 6, we have a match length 4.
2 + 4 + 2 = 8
Upvotes: 0
Reputation: 10565
You can create a suffix array (and LCP array) in O(n) with Ukkonen's algorithm, then it becomes trivial to find it with another pass in O(n), summing LCP values around the original string:
LCP SA suffix
0 9 .
0 7 ab.
> 2 5 abab.
> 4 0 ababcabab.
> 2 2 abcabab.
0 8 b.
1 6 bab.
3 1 babcabab.
1 3 bcabab.
0 4 cabab.
0 0 ababcabab.
Upvotes: 4
Reputation: 386
use a for loop (java in this example):
String s = "ababcabab";
int count = 0;
int count = 0;
for(int i = 1; i < s.length(); i++){ // for loop for all substrings [EDIT]: starts w/ 1 instead of 0. Thanks to vincent
String sub = s.substring(i);
for(int j = 0; j < sub.length() && sub.toCharArray()[j] == s.toCharArray()[j]; j++) /note that for & while loops in java are very similar. stops when substring doesn't match anymore **OR** substring's end is reached
{
count++; // increases count for every matching char in substring in a row
}
}
System.out.println("The count is: " + count);
Upvotes: 0