Reputation: 299

Compress a list on recurring letters

While preparing for a test, I am solving tests from previous years.

Write the function compress(lst) that receives a non empty list of repetitive letters and returns a list of tuples, each tuple containing the letter and the number or subsequent repetitions. ( see example)

e.g.:

for:

['a','a', 'b', 'b', 'b', 'c', 'a', 'a']

the function should return:

[('a', 2), ('b', 3), ('c', 1), ('a', 2)]

Here's my code:

def compress(lst):
    res = []
    i = 0
    for letter in lst:
        letter_count = 0
        while i < len(lst) and lst[i] == letter:
            letter_count += 1
            i +=1
        res.append((letter, letter_count))
    return res

My function returns:

[('a', 2), ('a', 0), ('b', 3), ('b', 0), ('b', 0), ('c', 1), ('a', 2), ('a', 0)]

I can see why it does that, but I cant see how to change my code to solve the problem.

Upvotes: 3

Answers (4)

Jean-François Fabre

Reputation: 140246

groupby is the pythonic way. If you want to emulate that, you can do it with O(1) with only one loop, no indexes (not very pythonic) and a variable to store the previous letter:

def compress(lst):
    res = []
    previous_letter = None
    count = 0

    for letter in lst:
        if previous_letter is None:
            # first time
            count += 1
            previous_letter = letter
        elif previous_letter == letter:
            # same letter as before, count
            count += 1
        else:
            # different letter: store result, reset counter
            res.append((previous_letter,count))
            count = 1
            previous_letter = letter

     # last iteration doesn't append the last result, fix this
    res.append((previous_letter,count))
    return res

result:

[('a', 2), ('b', 3), ('c', 1), ('a', 2)]

Upvotes: 1

Mad Physicist

Reputation: 114440

The issue is that the while loop correctly counts occurrences, the for loop marches on inexorably, one character at a time. Since you're already incrementing the index correctly in the while loop, the simplest thing would be to get rid of either the for or while loop entirely. The only purpose to having multiple loops here is to attempt to avoid an if, and as you see, that doesn't really work.

res = []
prev = ''
for letter in lst:
    if prev == letter:
        letter_count += 1
    else:
        if prev:
            res.append((prev, letter_count))
        letter_count = 1
    prev = letter

This is similar to what you'd get from itertools.groupby:

[(k, len(list(g))) for k, g in groupby(lst)]

Upvotes: 1

Austin

Reputation: 26039

Use groupby from itertools module which perfectly fits here:

from itertools import groupby

lst = ['a','a', 'b', 'b', 'b', 'c', 'a', 'a']

print([(k, len(list(v))) for k, v in groupby(lst)])
# [('a', 2), ('b', 3), ('c', 1), ('a', 2)]

Upvotes: 3

rafaelc

Reputation: 59274

(Assuming you want to correct your code)

Save the index of the current analyzed letter and just sum to it how many times this letter repeats, so you skip already analyzed letters

def compress(lst):
    res = []
    i = 0
    ind = 0
    while ind < len(lst):
        letter_count = 0
        while i < len(lst) and lst[i] == lst[ind]:
            letter_count += 1
            i +=1
        res.append((lst[ind], letter_count))
        ind += letter_count
    return res

>>> compress(['a','a', 'b', 'b', 'b', 'c', 'a', 'a'])
[('a', 2), ('b', 3), ('c', 1), ('a', 2)]

Upvotes: 2

Compress a list on recurring letters

Answers (4)

Related Questions