user10281
user10281

Reputation:

Run length encoding in Python

I'm trying to write a simple Python algorithm to solve this problem. Can you please help me figure out how to do this?

If any character is repeated more than 4 times, the entire set of repeated characters should be replaced with a slash '/', followed by a 2-digit number which is the length of this run of repeated characters, and the character. For example, "aaaaa" would be encoded as "/05a". Runs of 4 or less characters should not be replaced since performing the encoding would not decrease the length of the string.

Upvotes: 20

Views: 57466

Answers (10)

Hultner
Hultner

Reputation: 3780

I see many great solutions here but none that feels very pythonic to my eyes. So I'm contributing with a implementation I wrote myself today for this problem.

from typing import Iterator, Tuple
from itertools import groupby

def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
    """Returns run length encoded Tuples for string"""
    # A memory efficient (lazy) and pythonic solution using generators
    return ((x, sum(1 for _ in y)) for x, y in groupby(data))

This will return a generator of Tuples with the character and number of instances, but can easily be modified to return a string as well. A benefit of doing it this way is that it's all lazy evaluated and won't consume more memory or cpu than needed if you don't need to exhaust the entire search space.

If you still want string encoding the code can quite easily be modified for that use case like this:

def run_length_encode(data: str) -> str:
    """Returns run length encoded string for data"""
    # A memory efficient (lazy) and pythonic solution using generators
    return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))

This is a more generic run length encoding for all lengths, and not just for those of over 4 characters. But this could also quite easily be adapted with a conditional for the string if wanted.

Upvotes: 11

Motez Ouledissa
Motez Ouledissa

Reputation: 9

def RLE_comp_encode(text):
    if text == text[0]*len(text) :
        return str(len(text))+text[0]
    else:
        comp_text , r = '' , 1
        for i in range (1,len(text)):
            if text[i]==text[i-1]:
                r +=1
                if i == len(text)-1:
                    comp_text += str(r)+text[i]
            else :
                comp_text += str(r)+text[i-1]
                r = 1
    return comp_text

This worked for me,

Upvotes: 0

Tom Smith
Tom Smith

Reputation: 9

Split=(list(input("Enter string: ")))
Split.append("")
a = 0
for i in range(len(Split)):
    try:
        if (Split[i] in Split) >0:
            a = a + 1
        if Split[i] != Split[i+1]:
            print(Split[i],a)
            a = 0
    except IndexError:
        print()

this is much easier and works everytime

Upvotes: 0

text=input("Please enter the string to encode")
encoded=[]
index=0
amount=1
while index<=(len(text)-1):  
  if index==(len(text)-1) or text[index]!=text[(index+1)]:
    encoded.append((text[index],amount))        
    amount=1
  else:
    amount=amount+1            
  index=index+1   
print(encoded)

Upvotes: -1

Ashwin Joshi
Ashwin Joshi

Reputation: 5

An easy solution to run-length encoding which I can think of:

For encoding a string like "a4b5c6d7...":

def encode(s):
    counts = {}
    for c in s:
        if counts.get(c) is None:
            counts[c] = s.count(c)
    return "".join(k+str(v) for k,v in counts.items())

For decoding a string like "aaaaaabbbdddddccccc....":

def decode(s):
    return "".join((map(lambda tup:  tup[0] * int(tup[1]), zip(s[0:len(s):2], s[1:len(s):2]))))

Fairly easy to read and simple.

Upvotes: -1

Thomas Ahle
Thomas Ahle

Reputation: 31624

Rosetta Code has a lot of implementations, that should easily be adaptable to your usecase.

Here is Python code with regular expressions:

from re import sub

def encode(text):
    '''
    Doctest:
        >>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
        '12W1B12W3B24W1B14W'    
    '''
    return sub(r'(.)\1*', lambda m: str(len(m.group(0))) + m.group(1),
               text)

def decode(text):
    '''
    Doctest:
        >>> decode('12W1B12W3B24W1B14W')
        'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
    '''
    return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
               text)

textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
assert decode(encode(textin)) == textin

Upvotes: 8

user786033
user786033

Reputation:

I know this is not the most efficient solution, but we haven't studied functions like groupby() yet so here's what I did:

def runLengthEncode (plainText):
    res=''
    a=''
    count = 0
    for i in plainText:
        count+=1
        if a.count(i)>0:
            a+=i
        else:
            if len(a)>4:
                if len(a)<10:
                    res+="/0"+str(len(a))+a[0][:1]
                else:
                    res+="/" + str(len(a)) + a[0][:1]
                a=i
            else:
                res+=a
                a=i
        if count == len(plainText):
            if len(a)>4:
                if len(a)<10:
                    res+="/0"+str(len(a))+a[0][:1]
                else:
                    res+="/" + str(len(a)) + a[0][:1]
            else:
                res+=a
    return(res)

Upvotes: 0

pkacprzak
pkacprzak

Reputation: 5613

You can use the groupby() function combined with a list/generator comprehension:

from itertools import groupby, imap

''.join(x if reps <= 4 else "/%02d%s" % (reps, x) for x, reps in imap(lambda x: (x[0], len(list(x[1]))), groupby(s)))

Upvotes: -1

Serdalis
Serdalis

Reputation: 10489

Aside for setting a=i after encoding a sequence and setting a width for your int when printed into the string. You could also do the following which takes advantage of pythons groupby. Its also a good idea to use format when constructing strings.

from itertools import groupby

def runLengthEncode (plainText):
    res = []

    for k,i in groupby(plainText):
        run = list(i)
        if(len(run) > 4):
            res.append("/{:02}{}".format(len(run), k))
        else:
            res.extend(run)

    return "".join(res)

Upvotes: 5

Karoly Horvath
Karoly Horvath

Reputation: 96326

Just observe the behaviour:

>>> runLengthEncode("abcd")
'abc'

Last character is ignored. You have to append what you've collected.

>>> runLengthEncode("abbbbbcd")
'a/5b/5b'

Oops, problem after encoding. You should set a=i even if you found a long enough sequence.

Upvotes: 2

Related Questions