Scott B
Scott B

Reputation: 2602

Python/Regex - Expansion of Parentheses and Slashes

I'm looking for a way to expand numbers that are separated by slashes. In addition to the slashes, parentheses, may be used around some (or all) numbers to indicate a "group" which may be repeated (by the number of times directly following the parentheses) or repeated in reverse (followed by 's' as shown in the second set of examples). Some examples are:

1          -> ['1']                          -> No slashes, no parentheses
1/2/3/4    -> ['1', '2', '3', '4']           -> basic example with slashes
1/(2)4/3   -> ['1', '2', '2', '2', '2', '3'] -> 2 gets repeated 4 times
1/(2/3)2/4 -> ['1', '2', '3', '2', '3', '4'] -> 2/3 is repeated 2 times
(1/2/3)2   -> ['1', '2', '3', '1', '2', '3'] -> Entire sequence is repeated twice
(1/2/3)s   -> ['1', '2', '3', '3', '2', '1'] -> Entire sequence is repeated in reverse
1/(2/3)s/4 -> ['1', '2', '3', '3', '2', '4'] -> 2/3 is repeated in reverse

In the most general case, there could even be nested parentheses, which I know generally make the use of regex impossible. In the current set of data I need to process, there are no nested sets like this, but I could see potential use for it in the future. For example:

1/(2/(3)2/4)s/5 -> 1/(2/3/3/4)s/5
                -> 1/2/3/3/4/4/3/3/2/5
                -> ['1', '2', '3', '3', '4', '4', '3', '3', '2', '5']

I know of course that regex cannot do all of this (especially with the repeating/reversing sets of parenthesis). But if I can get a regex that at least separates the strings of parenthesis from those not in parenthesis, then I could probably make some loop pretty easily to take care of the rest. So, the regex I'd be looking for would do something like:

1               -> ['1']
1/2/3/4         -> ['1', '2', '3', '4']
1/(2)4/3        -> ['1', '(2)4', '3']
1/(2/3)2/4      -> ['1', '(2/3)2', '4']
1/(2/(3)2/4)s/5 -> ['1', '(2/(3)/2/4)s', '5']

And then I could loop on this result and continue expanding any parentheses until I have only digits.

EDIT

I wasn't totally clear in my original post. In my attempt to make the examples as simple as possible, I perhaps oversimplified them. This needs to work for numbers >= 10 as well as negative numbers.

For example:

1/(15/-23)s/4   -> ['1', '(15/-23)s', '4']
                -> ['1', '15', '-23', '-23', '15', '4']

Upvotes: 5

Views: 733

Answers (3)

Rohit Jain
Rohit Jain

Reputation: 213401

Since you are dealing with nested parenthesis, regex can't help you much here. It cannot easily convert the string to the list, as you wanted at the end.

You would better go with parsing the string yourself. You can try this code, just to meet your requirement at the end:

Parsing string into list without loosing parenthesis:

def parse(s):

    li = []

    open = 0
    closed = False
    start_index = -1

    for index, c in enumerate(s):
        if c == '(':
            if open == 0:
                start_index = index
            open += 1

        elif c == ')':
            open -= 1
            if open == 0:
                closed = True

        elif closed:
            li.append(s[start_index: index + 1])
            closed = False

        elif open == 0 and c.isdigit():
            li.append(c)

    return li

This will give you for the string '1/(2/(3)2/4)s/5' the following list:

['1', '(2/(3)2/4)s', '5']

and for the string '1/(15/-23)s/4', as per your changed requirement, this gives:

['1', '(15/-23)s', '4']

Now, you need to take care of the breaking the parenthesis further up to get different list elements.


Expanding the strings with parenthesis to individual list elements:

Here you can make use of a regex, by just dealing with inner-most parenthesis at once:

import re

def expand(s):

    ''' Group 1 contains the string inside the parenthesis
        Group 2 contains the digit or character `s` after the closing parenthesis

    '''    
    match = re.search(r'\(([^()]*)\)(\d|s)', s)
    if match:
        group0 = match.group()
        group1 = match.group(1)
        group2 = match.group(2)
        if group2.isdigit():
            # A digit after the closing parenthesis. Repeat the string inside
            s = s.replace(group0, ((group1 + '/') * int(group2))[:-1])
        else:
            s = s.replace(group0, '/'.join(group1.split('/') + group1.split('/')[::-1]))

    if '(' in s:
        return expand(s)

    return s

li = parse('1/(15/-23)2/4')

for index, s in enumerate(li):
    if '(' in s:
        s = expand(s)
        li[index] = s.split('/')

import itertools
print list(itertools.chain(*li))

This will give you the required result:

['1', '15', '-23', '-23', '15', '4']

The above code iterates over the list generated from parse(s) method, and then for each element, recursively expands the inner most parenthesis.

Upvotes: 3

Sean McSomething
Sean McSomething

Reputation: 6517

Regular expressions are the completely wrong tool for this job. There's a long, drawn out explanation as to why regular expressions are not appropriate (If you want to know why, here's an online course). A simple recursive parser is easy enough to write to handle this that you'd probably be done with it well before you finished debugging your regular expression.

It's a slow day so I took it upon myself to write it myself, complete with doctests.

def parse(s):
    """
    >>> parse('1')
    ['1']
    >>> parse('1/2/3/4')
    ['1', '2', '3', '4']
    >>> parse('1/(2)4/3')
    ['1', '2', '2', '2', '2', '3']
    >>> parse('1/(2/3)2/4')
    ['1', '2', '3', '2', '3', '4']
    >>> parse('(1/2/3)2')
    ['1', '2', '3', '1', '2', '3']
    >>> parse('1/(2/3)s/4')
    ['1', '2', '3', '3', '2', '4']
    >>> parse('(1/2/3)s')
    ['1', '2', '3', '3', '2', '1']
    >>> parse('1/(2/(3)2/4)s/5')
    ['1', '2', '3', '3', '4', '4', '3', '3', '2', '5']
    """
    return _parse(list(s))


def _parse(chars):
    output = []
    while len(chars):
        c = chars.pop(0)
        if c == '/':
            continue
        elif c == '(':
            sub = _parse(chars)
            nextC = chars.pop(0)
            if nextC.isdigit():
                n = int(nextC)
                sub = n * sub
                output.extend(sub)
            elif nextC == 's':
                output.extend(sub)
                output.extend(reversed(sub))
        elif c == ')':
            return output
        else:
            output.extend(c)

    return output

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Upvotes: 0

Joohwan
Joohwan

Reputation: 2512

Here is another way to get that done.

def expand(string):

    level = 0
    buffer = ""
    container = []

    for char in string:

        if char == "/":
            if level == 0:
                container.append(buffer)
                buffer = ""
            else:
                buffer += char
        elif char == "(":
            level += 1
            buffer += char
        elif char == ")":
            level -= 1
            buffer += char
        else:
            buffer += char

    if buffer != "":
        container.append(buffer)

    return container

Upvotes: 3

Related Questions