Captain
Captain

Reputation: 9

How to count the occurrence of two sub-strings from a given string without overlapping in python?

The resultant occurrence of substring1 and substring2 in 'abbabba' should be 3 (ab, ba, ba). But with str.count() I am getting 4. Any suggestions, please?

substring1 = 'ab'   
substring2 = 'ba'
stringg = 'abbabba'
print(stringg.count(substring1) + stringg.count(substring2))

Upvotes: 0

Views: 372

Answers (4)

Maurice Meyer
Maurice Meyer

Reputation: 18106

You need to count manually, this just works for equal-sized substrings:

stringg = 'abbabba'
patterns = {'ab': 0, 'ba': 0}
c = 0

while c < len(stringg) -1:
    substr = stringg[c:c+2]
    if substr in patterns:
        patterns[substr] += 1
        c += 1
    c += 1

print (patterns)
print ('Total', sum(patterns.values()))

Output:

{'ab': 1, 'ba': 2}
Total 3

EDIT: In case you got substrings of different length's, you could additionally loop over the them:

stringg = 'abbabbaccccab'
patterns = {'ab': 0, 'ba': 0, 'ccc': 0}
c = 0

while c < len(stringg) -1:
    for pattern in patterns:
        substr = stringg[c:c+len(pattern)]
        if substr == pattern:
            patterns[substr] += 1
            c += len(pattern) - 1
            break
    c += 1

print (patterns)
print ('Total', sum(patterns.values()))

Output:

{'ab': 2, 'ba': 2, 'ccc': 1}
('Total', 5)

Upvotes: 2

Arne
Arne

Reputation: 10545

You could replace the occurrences of the first substring with a character that does not appear in any of the strings before counting the second substring:

without1 = stringg.replace(substring1, '_')
print(stringg.count(substring1) + without1.count(substring2))
# -> 3

But be careful: For certain strings the question may not be well-defined, because the result may depend on which substring is counted first.

Upvotes: 0

salt-die
salt-die

Reputation: 854

One can use a custom counting function in this case where we grow a substring with characters from the string we're interested until we find a match, then we reset it:

def count(string, *substrings):
    acc = ''  # accumulator
    matches = 0
    for char in string:
        acc += char

        for substring in substrings:
            if substring in acc:
                matches += 1
                acc = ''
                break

    return matches

And we call it like so:

count('abbabba', 'ab', 'ba')  # 3

Upvotes: 0

Alok
Alok

Reputation: 8988

It is because the occurrence is counting like this. BTW - the machine it doing the correct thing, you somehow missed it what I am about to show you

String = abbabba Substring 1 = ab Substring 2 = ba

# ab and ba count in stringg abbabba 
# ab = 1 ['ab'babba]
# ba = 1 [ab'ba'bba]
# ab = 1 [abb'ab'ba]
# ba = 1 [abbab'ba']

So from above, you can see the count of ab = 2 and count of ba = 2. So when adding up, it will give you 4 in total.

So, this line print(stringg.count(substring1) + stringg.count(substring2)) is doing the right job. It is not ignoring the already included substring, which you want.

To do that, what we can do is something like this:

substring1 = 'ab'   
substring2 = 'ba'
stringg = 'abbabba'

i = 0
count = 0
while i <= len(stringg) - 2:
    if stringg[i]+stringg[i+1] == substring1 or stringg[i]+stringg[i+1] == substring2:
        count += 1
        i += 2
    else:
        i += 1

print(count) # OUTPUT 3

Upvotes: 0

Related Questions