Yue Zequn
Yue Zequn

Reputation: 11

trying to find a more efficient algorithm

A binary string is a string where each character is either '0' or '1'. A binary string S is good if and only if for every substring of S, the number of '0's is less than or equal to the number of '1's.

A substring of string S can be obtained by deleting several (or zero) characters from the beginning of S and several (or zero) characters from the end of S, leaving at least two characters from S. For example, if S = "abcdfab", then "abcd", "fab", "bcdfa", "abcdfab" are substrings of S, while "abab", "", "d", and "adcdab" are not substring of S.

I am trying to figure out a way to check if a binary string of length N is good.

The naive way is to check every substring of this binary string which has the time complexity of O(N^2), is there an algorithm which can have a time complexity of O(N) or better?

Upvotes: 0

Views: 72

Answers (2)

paxdiablo
paxdiablo

Reputation: 881283

In order for every single (two-or-more-character) substring to have at least as many ones as zeros, there can be no consecutive zeros - in fact, you need at least two ones between every pair of zeros. That's because the string 010 is not valid but there's no two-character substring of 0110 that violates your constraint.

So the algorithm would be:

set oneCount to 2 # Zero at start of string is okay.
for each character ch in string:
    if ch is 0:
        if oneCount is less than 2:
            generate error
        set oneCount to 0
    else:
        increment oneCount

That's actually O(n) time but you can actually do better than that by amortising the cost. Since there may be multiple modifying operations on the string before checking, you can defer the checking to bring the amortised cost down.

By that, I mean cache the validity of the string so that, if you check without having modified it, just return the cached value instead of re-checking. That's the coarsest level of amortisation but may result in improvements just on its own:

define string = "", dirty = false, cachedValidity = true

define isValid():
    if dirty:
        cachedValidity = true
        set oneCount to 2
        for each character ch in string:
            if ch is 0:
                if oneCount is less than 2:
                    cachedValidity = false
                    exit for loop
                set oneCount to 0
            else:
                increment oneCount
    dirty = false
    return cachedValidity

Then you just have to ensure that any operation on the string sets dirty to true, something that's actually quite easy in an object-oriented language.

For even more optimisation, you don't need to set dirty to true for all operations, it actually depends on what you're doing. The string validity will not change under (at least) the following conditions:

  1. If you insert a one next to another one.
  2. If you delete a one that was between two other ones.
  3. If you insert a zero right in the middle if four consecutive ones.

None of these operations affect the validity so they should not set the dirty flag. There may be others you can discover under analysis but that's a good start and will allow you to further amortise the cost of checking.

Upvotes: 5

nipunasudha
nipunasudha

Reputation: 2607

Each pair of 0s should have at least two 1s between them. It is O(n).

def isGood(bstr):
    count = 2
    for i in bstr:
        if i == '0':
            if count >= 2:
                count = 0
            else:
                return False
        else:
            count += 1
    return True

print(isGood("0010101"))

Hope this helped.

Upvotes: 2

Related Questions