Reputation: 1618

Removing data between double squiggly brackets with nested sub brackets in python

I'm having some difficulty with this problem. I need to remove all data that's contained in squiggly brackets.

Like such:

Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.

Becomes:

Hello there.

Here's my first try (I know it's terrible):

while 1:
    firstStartBracket = text.find('{{')
    if (firstStartBracket == -1):
        break;
    firstEndBracket = text.find('}}')
    if (firstEndBracket == -1):
        break;
    secondStartBracket = text.find('{{',firstStartBracket+2);
    lastEndBracket = firstEndBracket;
    if (secondStartBracket == -1 or secondStartBracket > firstEndBracket):
        text = text[:firstStartBracket] + text[lastEndBracket+2:];
        continue;
    innerBrackets = 2;
    position = secondStartBracket;
    while innerBrackets:
        print innerBrackets;
        #everytime we find a next start bracket before the ending add 1 to inner brackets else remove 1
        nextEndBracket = text.find('}}',position+2);
        nextStartBracket = text.find('{{',position+2);
        if (nextStartBracket != -1 and nextStartBracket < nextEndBracket):
            innerBrackets += 1;
            position = nextStartBracket;
            # print text[position-2:position+4];
        else:
            innerBrackets -= 1;
            position = nextEndBracket;
            # print text[position-2:position+4];
            # print nextStartBracket
            # print lastEndBracket
            lastEndBracket = nextEndBracket;
        print 'pos',position;
    text = text[:firstStartBracket] + text[lastEndBracket+2:];

It seems to work but runs out of memory quite fast. Is there any better way to do this (hopefully with regex)?

EDIT: I was not clear so I'll give another example. I need to allow for multiple top level brackets.

Like such:

Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.

Becomes:

Hello there friend.

Upvotes: 13

Answers (7)

bobble bubble

Reputation: 18555

With PyPI regex and recursive regex eg like this:

p = r'{{(?>[^}{]+|(?0))*}} ?'

See this demo at regex101 or Python demo at tio.run.

import regex as re

str = re.sub(p, '', str)

fyi: Regular expression to match balanced parentheses

Upvotes: 0

Moritz

Reputation: 5418

This question makes fun. Here is my attempt:

import re

def find_str(string):

    flag = 0

    for index,item in enumerate(string):

        if item == '{':
            flag += 1

        if item == '}':
            flag -= 1

        if flag == 0:
            yield index

s = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.'

index = list(find_str(s))

l = [s[i] for i in index]

s = ' '.join(l)

re.sub('}\s+','',s)

'H e l l o t h e r e f r i e n d .'

Upvotes: 1

gil

Reputation: 2144

For good measure, yet another solution. It starts by finding and replacing the leftmost innermost braces and works its way outwards, rightwards. Takes care of multiple top level braces.

import re

def remove_braces(s):
    pattern = r'\{\{(?:[^{]|\{[^{])*?\}\}'
    while re.search(pattern, s):
        s = re.sub(pattern, '', s)
    return s

Not the most efficient, but short.

>>> remove_braces('Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.')
'Hello  there  friend.'

Upvotes: 1

Jason S

Reputation: 13799

This is a regex/generator based solution that works with any number of braces. This problem does not need an actual stack because there is only 1 type (well, pair) of token involved. The level fills the role that a stack would fill in a more complex parser.

import re

def _parts_outside_braces(text):
    level = 0
    for part in re.split(r'(\{\{|\}\})', text):
        if part == '{{':
            level += 1
        elif part == '}}':
            level = level - 1 if level else 0
        elif level == 0:
            yield part

x = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.  {{ second set {{ of }} braces }}'
print(''.join(_parts_outside_braces(x)))

More general points... the capture group in the regex is what makes the braces show up in the output of re.split, otherwise you only get the stuff in between. There's also some support for mismatched braces. For a strict parser, that should raise an exception, as should running off the end of the string with level > 0. For a loose, web-browser style parser, maybe you would want to display those }} as output...

Upvotes: 4

ShellayLee

Reputation: 367

The problem is that you would have to deal with nested structure, which means regular expression may not suffice. However, a simple parser with a memory of depth level may come to rescue - it is very simple to write, just store the depth level into a variable.

I just post a more pythonic way of writing the solution here, which may be a good reference for you.

import re

def rem_bra(inp):
    i = 0
    lvl = 0
    chars = []
    while i < len(inp):
        if inp[i:i+2] == '{{':
            lvl += 1
            i += 1
        elif inp[i:i+2] == '}}':
            lvl -= 1
            i += 1
        else:
            if lvl < 1:
                chars.append(inp[i])
        i += 1
    result = ''.join(chars)

    # If you need no more contigious spaces, add this line:
    result = re.sub(r'\s\s+', r' ', result)

    return result


inp = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."

print(rem_bra(inp))
>>> Hello there.

Upvotes: 1

alecxe

Reputation: 474201

You can use pyparsing module here. Solution based on this answer:

from pyparsing import nestedExpr


s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend."

expr = nestedExpr('{{', '}}')
result = expr.parseString("{{" + s + "}}").asList()[0]
print(" ".join(item for item in result if not isinstance(item, list)))

Prints:

Hello there friend.

The following would only work if there is only one top-level pair of braces.

If you want to remove everything inside the double curly braces with the braces themselves:

>>> import re
>>> 
>>> s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."
>>> re.sub(r"\{\{.*\}\} ", "", s)
'Hello there.'

\{\{.*\}\} would match double curly braces followed by any characters any number of times (intentionally left it "greedy") followed by double curly braces and a space.

Upvotes: 4

Ren

Reputation: 2946

Try the following code:

import re

s = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there'
m = re.search('(.*?) {.*}(.*)',s)
result = m.group(1) + m.group(2)
print(result)

Upvotes: 1

Removing data between double squiggly brackets with nested sub brackets in python

Answers (7)

Related Questions