kano
kano

Reputation: 5920

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me: Create a function that can pack or unpack a string of letters. So aaabb would be packed a3b2 and vice versa.

For the packing part of the function, I wrote the following

def packer(s):
    if s.isalpha():        # Defines if unpacked

    stack = []

        for i in s:
            if s.count(i) > 1:
                if (i + str(s.count(i))) not in stack:
                    stack.append(i + str(s.count(i)))
            else:
                stack.append(i)

        print "".join(stack)

    else:
        print "Something's not quite right.."
        return False
packer("aaaaaaaaaaaabbbccccd")

This seems to work all proper. But the assignment says that if the input has (for example) the letter a after b or c, then it should later be unpacked into it's original form. So "aaabbkka" should become a3b2k2a, not a4b2k2. I hence figured, that I cannot use the "count()" command, since that counts all occurrences of the item in the whole string, correct? What would be my options here then?

On to the unpacking - I've thought of the basics what my code needs to do -

  1. between the " if s.isalpha():" and else, I should add an elif that checks whether or not the string has digits in it. (I figured this would be enough to determine whether it's the packed version or unpacked).
  2. Create a for loop and inside of it an if sentence, which then checks for every element:

    2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
    2.2. If it has no number following it > Return just the element.

Big question number 2 - how do I check whether it's a number or just another alphabetical element following an element in the list? I guess this must be done with slicing, but those only take integers. Could this be achieved with the index command?

Also - if this is of any relevance - so far I've basically covered lists, strings, if and for and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)

All help appreciated for the newbie enthusiast!


SOLVED:

def packer(s):
    if s.isalpha():        # Defines if unpacked
        groups= []
        last_char = None

        for c in s:
            if c == last_char:
                groups[-1].append(c)
            else:
                groups.append([c])
            last_char = c

        return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)

    else:                   # Seems to be packed

        stack = ""

        for i in range(len(s)):
            if s[i].isalpha():
                if i+1 < len(s) and s[i+1].isdigit():
                    digit = s[i+1]
                    char = s[i]
                    i += 2

                    while i < len(s) and s[i].isdigit():
                        digit +=s[i]
                        i+=1
                    stack += char * int(digit)

                else:
                    stack+= s[i]
            else:
                ""
        return "".join(stack) 
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22")) 

So this is my final code. Almost managed to pull it all off with just for loops and if statements. In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!

Upvotes: 0

Views: 1409

Answers (3)

millimoose
millimoose

Reputation: 39950

This is an implementation of the algorithm I outlined in the comments:

from itertools import takewhile, count, islice, izip

def consume(items):
    from collections import deque
    deque(items, maxlen=0)

def ilen(items):
    result = count()
    consume(izip(items, result))
    return next(result)

def pack_or_unpack(data):
    start = 0
    result = []

    while start < len(data):
        if data[start].isdigit():
            # `data` is packed, bail
            return unpack(data)
        run = run_len(data, start)

        # append the character that might repeat
        result.append(data[start]) 

        if run > 1:
            # append the length of the run of characters
            result.append(str(run))

        start += run

    return ''.join(result)


def run_len(data, start):
    """Return the end index of the run of identical characters starting at 
    `start`"""
    return start + ilen(takewhile(lambda c: c == data[start], 
                                  islice(data, start, None)))

def unpack(data):
    result = []
    for i in range(len(data)):
        if data[i].isdigit():
            # skip digits, we'll look for them below
            continue

        # packed character
        c = data[i]
        # number of repetitions
        n = 1
        if (i+1) < len(data) and data[i+1].isdigit():
            # if the next character is a digit, grab all the digits in the 
            # substring starting at i+1
            n = int(''.join(takewhile(str.isdigit, data[i+1:])))

        # append the repeated character
        result.append(c*n) # multiplying a string with a number repeats it
    return ''.join(result)

print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')

A regex-flavoured version of unpack() would be:

import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
    matches = UNPACK_RE.finditer(data)
    pairs = ((m.group('char'), m.group('count')) for m in matches)
    return ''.join(char * (int(count) if count else 1)
                   for char, count in pairs)

This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Upvotes: 2

Kabie
Kabie

Reputation: 10663

A straightforward solution: If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.

def packer(s):
    groups = []
    last_char = None
    for c in s:
        if c == last_char:
            groups[-1].append(c)
        else:
            groups.append([c])
        last_char = c
    return ''.join('%s%s'%(g[0], len(g)) for g in groups)

Another approach is using re.

Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:

regex = re.compile(r'(.)\1+')

def replacer(match):
    return match.group(1) + str(len(match.group(0)))

regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'

Upvotes: 2

oleg
oleg

Reputation: 4182

I think You can use `itertools.grouby' function

for example

import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)

of course one can make this code more readable using generator instead of generator statement

Upvotes: 2

Related Questions