Reputation: 47
What can be the most efficient algorithm to count the number of substrings of a given string that contain a given character.
e.g. for abb b
sub-strings : a, b, b, ab, bb, abb. Answer : strings containg b atlest once = 5.
PS. i solved this question by generating all the substrings and then checking in O(n ^ 2). Just want to know whether there can be a better solution to this.
Upvotes: 0
Views: 3098
Reputation: 80187
Let you need to find substrings with character X.
Scan string left to right, keeping position of the last X: lastX
with starting value -1
When you meet X at position i, add i+1
to result and update lastX
(this is number of substrings ending in current position and they all contain X)
When you meet another character, add lastX + 1
to result
(this is again number of substrings ending in current position and containing X),
because the rightmost possible start of substring is position of the last X
Algorithm is linear.
Example:
a X a a X a
good substrings overall
idx char ending at idx lastX count count
0 a - -1 0 0
1 X aX X 1 2 2
2 a aXa Xa 1 2 4
3 a aXaa Xaa 1 2 6
4 X aXaaX XaaX aaX aX X 4 5 11
5 a aXaaXa XaaXa aaXa aXa Xa 4 5 16
Python code:
def subcnt(s, c):
last = -1
cnt = 0
for i in range(len(s)):
if s[i] == c:
last = i
cnt += last + 1
return cnt
print(subcnt('abcdba', 'b'))
Upvotes: 3
Reputation: 9085
Think of a substring as selecting two elements from the gaps between the letters in your string and including everything between them (where there are gaps on the extreme ends of the string).
For a string of length n, there are choose(n+1,2) substrings.
Of those, for each run of k characters that doesn't include the target, there are choose(k+1,2) substrings that only include letters from that substring. All other substrings of the main string must include the target.
Answer: choose(n+1,2) - sum(choose(k_i+1,2)), where the k_i are the lengths of runs of letters that don't include the target.
Upvotes: 0
Reputation: 17805
Let's consider the string as abcdaefgabb
and the given character as a
.
a
at index 4
, so number of substrings which will contain a
is from abcda
to aefgabb
. So, we add (4-0 + 1) + (10 - 4)
= 11
. These represent substrings as abcda
,bcda
,cda
,da
,a
,ae
,aef
,aefg
,aefga
,aefgab
and aefgabb
.a
, like you find it at index 0
and also at index 8
. Update: You will have to maintain 2 pointers between last occurred a
and the current a
to avoid calculating duplicate substrings which start end end with the same index.
Upvotes: 0
Reputation: 2794
You could turn this around and scan your string for occurrences of your letter. Every time you find an occurrence in some position i
, you know that it is contained by definition in all the substrings that contain it (i.e. all substrings which start before or at i
and end at or after i
), so you only need to store pairs of indices to define substrings instead of storing substrings explicitly.
That being said, you'll still need O(n²) with this approach because although you don't mind repeated substrings as your example shows, you don't want to count the same substring twice, so you still have to make sure that you don't select the same pair of indices twice.
Upvotes: 0