alooparantha
alooparantha

Reputation: 47

Number of substrings of a given string containing a specific character

What can be the most efficient algorithm to count the number of substrings of a given string that contain a given character.

e.g. for abb b

sub-strings : a, b, b, ab, bb, abb. Answer : strings containg b atlest once = 5.

PS. i solved this question by generating all the substrings and then checking in O(n ^ 2). Just want to know whether there can be a better solution to this.

Upvotes: 0

Views: 3098

Answers (4)

MBo
MBo

Reputation: 80187

Let you need to find substrings with character X.

Scan string left to right, keeping position of the last X: lastX with starting value -1

When you meet X at position i, add i+1 to result and update lastX
(this is number of substrings ending in current position and they all contain X)

When you meet another character, add lastX + 1 to result
(this is again number of substrings ending in current position and containing X),
because the rightmost possible start of substring is position of the last X

Algorithm is linear.
Example:

a X a a X a
            good substrings                            overall     
idx  char   ending at idx             lastX   count    count
 0    a      -                        -1       0        0  
 1    X     aX X                       1       2        2 
 2    a     aXa Xa                     1       2        4
 3    a     aXaa Xaa                   1       2        6 
 4    X     aXaaX XaaX aaX aX X        4       5        11 
 5    a     aXaaXa XaaXa aaXa aXa Xa   4       5        16 

Python code:

def subcnt(s, c):
    last = -1
    cnt = 0
    for i in range(len(s)):
        if s[i] == c:
            last = i
        cnt += last + 1
    return cnt

print(subcnt('abcdba', 'b'))

Upvotes: 3

Dave
Dave

Reputation: 9085

Think of a substring as selecting two elements from the gaps between the letters in your string and including everything between them (where there are gaps on the extreme ends of the string).

For a string of length n, there are choose(n+1,2) substrings.

Of those, for each run of k characters that doesn't include the target, there are choose(k+1,2) substrings that only include letters from that substring. All other substrings of the main string must include the target.

Answer: choose(n+1,2) - sum(choose(k_i+1,2)), where the k_i are the lengths of runs of letters that don't include the target.

Upvotes: 0

nice_dev
nice_dev

Reputation: 17805

Let's consider the string as abcdaefgabb and the given character as a.

  • Loop over the string char by char.
  • If a character matches a given character, let's say a at index 4, so number of substrings which will contain a is from abcda to aefgabb. So, we add (4-0 + 1) + (10 - 4) = 11. These represent substrings as abcda,bcda,cda,da,a,ae,aef,aefg,aefga,aefgab and aefgabb.
  • This applies to wherever you find a, like you find it at index 0 and also at index 8.
  • Final answer is the sum of above mentioned math operations.

Update: You will have to maintain 2 pointers between last occurred a and the current a to avoid calculating duplicate substrings which start end end with the same index.

Upvotes: 0

Anthony Labarre
Anthony Labarre

Reputation: 2794

You could turn this around and scan your string for occurrences of your letter. Every time you find an occurrence in some position i, you know that it is contained by definition in all the substrings that contain it (i.e. all substrings which start before or at i and end at or after i), so you only need to store pairs of indices to define substrings instead of storing substrings explicitly.

That being said, you'll still need O(n²) with this approach because although you don't mind repeated substrings as your example shows, you don't want to count the same substring twice, so you still have to make sure that you don't select the same pair of indices twice.

Upvotes: 0

Related Questions