Chenyang
Chenyang

Reputation: 171

Implement itertools.groupby from scratch

I tried to implement a function, groupify(S) that takes in a

string S = 'aabbbbccca'

and returns

[('a', 2), ('b', 4), ('c', 3), ('a', 1)]

I am aware that we may use Python itertools.groupby to do this. However, I am trying to implement this by scratch. I can think of using deque to pop an item from the left, but don't know how to write exact code to do it (i.e. how to obtain [('a', 2), ('b', 4), ('c', 3), ('a', 1)] from an input S = 'aabbbbccca' using deque.). I appreciate any one's help.

UPDATE: Just figured out one solution myself. I have been struggling with deque.popleft() and later figured out that I need to use a holder to save anything popped out (i..e the popped variable below).

# implement groubpy by scratch
# example: given a string 'aabbbbccca' return [('a', 2), ('b', 4), ('c', 3), ('a', 1)]
from collections import deque
import unittest
from itertools import groupby

class Solution:
    def implement_groupby(self, s):
        pair = []
        m = deque(s)
        char = s[0]
        count = 0
        while m:
            popped = m.popleft()
            if char == popped:
                count += 1
            else:
                pair.append((char, count))
                char = popped
                count = 1

        pair.append((char, count))
        return pair

    def use_built_in_groupby(self, s):
        pair = []
        for key, group in groupby(s):
            pair.append((key, len(list(group))))

        return pair

class Test(unittest.TestCase):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.solution = Solution()
        self.s1 = 'aabbbbcccaa'
        self.s2 = 'abc'

    def expectEqual(self, first, second):
        with self.subTest():
            self.assertEqual(first, second)

    def testImplementGroupby(self):
        self.expectEqual(self.solution.implement_groupby(self.s1), [('a', 2), ('b', 4), ('c', 3), ('a', 2)])
        self.expectEqual(self.solution.implement_groupby(self.s2), [('a', 1), ('b', 1), ('c', 1)])

    def testBuildInGroupby(self):
        self.expectEqual(self.solution.use_built_in_groupby(self.s1), [('a', 2), ('b', 4), ('c', 3), ('a', 2)])
        self.expectEqual(self.solution.use_built_in_groupby(self.s2), [('a', 1), ('b', 1), ('c', 1)])

if __name__ == '__main__':
    unittest.main()   


Upvotes: 1

Views: 290

Answers (3)

Serik Zhilkamanov
Serik Zhilkamanov

Reputation: 1

Here is my solution.

def groupify(some_data):
group_list = {}
for i in list(some_data):
    if i in group_list.keys():
        group_list[i] += 1
    else:
        group_list.update({i: int(1)})
return group_list

sample = "aabbbbccca"

print(groupify(sample))

OUTPUT: {'a': 3, 'b': 4, 'c': 3}

Upvotes: 0

pylang
pylang

Reputation: 44455

You are actually trying to implement a run length encodong that compresses a string.

Here is a pure Python implementation without using collections.deque.

Given

A variant of unique_justseen from itertools recipes:

def unique_justseen_(s):
    """Return unique characters as seen from a string.

    Examples
    --------
    >>> s = "aabbbbccca"
    >>> unique_justseen_(s)
    'abca'

    """
    hold, result = "", ""

    for x in s:
        if x == hold:
            continue
        hold = x
        result += x

    return result

Code

def run_length_encode(s):
    """Return a list of compressed strings."""
    result = []

    for x in unique_justseen_(s):
        i = 0
        while (i < len(s)) and (x == s[i]):
            i += 1
            #print(x, s, i)
        s = s[i:]        
        result.append((x, i))

    return result

Demo

run_length_encode("aabbbbccca")
# [('a', 2), ('b', 4), ('c', 3), ('a', 1)]

Details

We iterate the unique grouped characters "abca" via the helper function unique_justseen_ . The inner while loop counts observed matches with the unique characters. s = s[i:] "shrinks" a copy of the initial string by eliminating each group of repeated characters (you can observe this by uncommenting the print statement). The unique character is packaged with the final incremented value i and appended to a resulting list.

Upvotes: 1

AnkushRasgon
AnkushRasgon

Reputation: 820

you can use deque to pop elements from list in the following manner

from collections import deque
d=deque(['4', '3', '2', '1'])
d.pop()
'1'
d.popleft()
'4'

In your case

d=deque([('a', 2), ('b', 4), ('c', 3)])
d.popleft()
print(d)

OUTPUT:

deque([('b', 4), ('c', 3)])

Upvotes: 0

Related Questions