swasthi
swasthi

Reputation: 43

How to count matching characters in subsequent substring in O(n)

How to count matching characters in subsequent substrings in O(n). Substrings are formed by removing one character at a time from the beginning.

For example: String given is ababcabab, the result expected is 8

Result expected: 2+4+2=8

Upvotes: 2

Views: 435

Answers (3)

גלעד ברקן
גלעד ברקן

Reputation: 23955

We can solve this in O(n) by drawing some logical conclusions: Since all matches are the same; that is, they are matching the string itself; any match that starts on index i of the string will contain all matches that start before i (or a portion as far as length permits). Additionally, any match where its length is greater than its starting index will consist of repetitions of the section of the start of the string to the start of the match. We need only record in full the matches we can find in one traversal of the string without stepping back, and deduce the rest.

Examples (non zero based):

"aaaaaa":
Starting on index 2, we have a match length 5. This match necessarily includes
a match of length 4 starting on index 3 (since index 3 is index 2 for the
substring that starts on index 2). Continuing the same logic, we add 3 + 2 + 1
for a total of 15, without needing to scan and compare more than Substr2.

"aabaabaa":
Starting on index 2, we have a match length 1.
Starting on index 4, we have a match length 5. This match necessarily includes
a match of length 1 starting on index 5 (since index 5 is index 2 for the
substring that starts on index 4). It also necessarily includes a match of 
length (5 - 3) starting on index 7 (since index 7 is index 4 for the substring
that starts on index 4), and this match implies another match of length 1, 
starting on index 8. Altogether 1 + 5 + 1 + (5 - 3) + 1 = 10. Again, the scan
was O(n).

"aabaabaabaabaa":
Starting on index 2, we have a match length 1.
Starting on index 4, we have a match length 11.
1 + 11 + 1 + (11 - 3) + 1 + (8 - 3) + 1 + (5 - 3) + 1 = 31.

"aabaaab":
Starting on index 2, we have a match length 1.
For repeated patterns in the beginning of the string, we can use a formula 
rather than multiple scans, so a string like "aabaaaaaaaaaab" would have the 
same complexity as the one above, (number of times the pattern repeats - number
of times the pattern repeats in the beginning of the string) * total length of
repeated pattern at the start of the string. We identify a pattern if the 
length of the first match is a multiple of its starting index. Identifying 
this pattern and using the formula also prevents erroneously missing the 
correct match to record (e.g., without it, we would have identified 'aa' and 
'a' at the end as matches and missed the 'aab'). 
So starting on index 4, we have (3 - 2) * 2 = 2
Starting on index 5, we have a match length 3.
1 + 2 + 3 + 1 = 7

"ababcabab":
Starting on index 3, we have a match length 2.
Starting on index 6, we have a match length 4.
2 + 4 + 2 = 8

Upvotes: 0

Juan Lopes
Juan Lopes

Reputation: 10565

You can create a suffix array (and LCP array) in O(n) with Ukkonen's algorithm, then it becomes trivial to find it with another pass in O(n), summing LCP values around the original string:

    LCP SA  suffix
    0   9   .
    0   7   ab.
>   2   5   abab.
>   4   0   ababcabab.
>   2   2   abcabab.
    0   8   b.
    1   6   bab.
    3   1   babcabab.
    1   3   bcabab.
    0   4   cabab.
    0   0   ababcabab.

Upvotes: 4

Josef Hoppe
Josef Hoppe

Reputation: 386

use a for loop (java in this example):

String s = "ababcabab";
int count = 0;
    int count = 0;
    for(int i = 1; i < s.length(); i++){ // for loop for all substrings [EDIT]: starts w/ 1 instead of 0. Thanks to vincent
        String sub = s.substring(i);
        for(int j = 0; j < sub.length() && sub.toCharArray()[j] == s.toCharArray()[j]; j++) /note that for & while loops in java are very similar. stops when substring doesn't match anymore **OR** substring's end is reached
        {
            count++; // increases count for every matching char in substring in a row
        }
    }
    System.out.println("The count is: " + count);

Upvotes: 0

Related Questions