Robin Andrews
Robin Andrews

Reputation: 3804

Avoiding Python Off-by-One Error in RLE Algorithm

EDIT: there's more wrong with this than just an off-by-one error, it seems.

I've got an off-by-one error in the following simple algorithm which is supposed to display the count of letters in a string, along the lines of run-length encoding.

I can see why the last character is not added to the result string, but if I increase the range of i I get index out of range for obvious reasons.

I want to know what the conceptual issue is here from an algorithm design perspective, as well as just getting my code to work.

Do I need some special case code to handle the last item in the original string? Or maybe it makes more sense to be comparing the current character with the previous character, although that poses a problem at the beginning of the algorithm?

Is there a general approach to this kind of algorithm, where current elements are compared to previous/next elements, which avoids index out of range issues?

def encode(text):
    # stores output string
    encoding = ""
    i = 0

    while i < len(text) - 1:
        # count occurrences of character at index i
        count = 1
        while text[i] == text[i + 1]:
            count += 1
            i += 1

        # append current character and its count to the result
        encoding += text[i] + str(count) 
        i += 1

    return encoding

text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1

Upvotes: 1

Views: 192

Answers (2)

One Lyner
One Lyner

Reputation: 2004

If you keep your strategy, you'll have to check i+1 < len(text). This gives something like:

def encode(text): 
    L = len(text) 
    start = 0 
    encoding = '' 
    while start < L: 
        c = text[start] 
        stop = start + 1 
        while stop < L and text[stop] == c: 
            stop += 1 
        encoding += c + str(stop - start) 
        start = stop 
    return encoding

Another way to do things, is to remember the start of each run:

def encode2(text): 
     start = 0 
     encoding = '' 
     for i,c in enumerate(text): 
         if c != text[start]: 
             encoding += text[start] + str(i-start) 
             start = i
     if text:
         encoding += text[start] + str(len(text)-start) 
     return encoding

This allows you to just enumerate the input which feels more pythonic.

Upvotes: 1

Gr&#233;goire Roussel
Gr&#233;goire Roussel

Reputation: 947

You're right, you should have while i < len(text) for the external loop to process the last character if it is different for the previous one (d in your case).

Your algorithm is then globally fine, but it will crash when looking for occurrences of the last character. At this point, text[i+1] becomes illegal.

To solve this, just add a safety check in the internal loop: while i+1 < len(text)

def encode(text):
    # stores output string
    encoding = ""
    i = 0

    while i < len(text):
        # count occurrences of character at index i
        count = 1
        # FIX: check that we did not reach the end of the string 
        # while looking for occurences
        while i+1 < len(text) and text[i] == text[i + 1]:
            count += 1
            i += 1

        # append current character and its count to the result
        encoding += text[i] + str(count) 
        i += 1

    return encoding

text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1d1

Upvotes: 1

Related Questions