Reputation: 1618
I'm having some difficulty with this problem. I need to remove all data that's contained in squiggly brackets.
Like such:
Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.
Becomes:
Hello there.
Here's my first try (I know it's terrible):
while 1:
firstStartBracket = text.find('{{')
if (firstStartBracket == -1):
break;
firstEndBracket = text.find('}}')
if (firstEndBracket == -1):
break;
secondStartBracket = text.find('{{',firstStartBracket+2);
lastEndBracket = firstEndBracket;
if (secondStartBracket == -1 or secondStartBracket > firstEndBracket):
text = text[:firstStartBracket] + text[lastEndBracket+2:];
continue;
innerBrackets = 2;
position = secondStartBracket;
while innerBrackets:
print innerBrackets;
#everytime we find a next start bracket before the ending add 1 to inner brackets else remove 1
nextEndBracket = text.find('}}',position+2);
nextStartBracket = text.find('{{',position+2);
if (nextStartBracket != -1 and nextStartBracket < nextEndBracket):
innerBrackets += 1;
position = nextStartBracket;
# print text[position-2:position+4];
else:
innerBrackets -= 1;
position = nextEndBracket;
# print text[position-2:position+4];
# print nextStartBracket
# print lastEndBracket
lastEndBracket = nextEndBracket;
print 'pos',position;
text = text[:firstStartBracket] + text[lastEndBracket+2:];
It seems to work but runs out of memory quite fast. Is there any better way to do this (hopefully with regex)?
EDIT: I was not clear so I'll give another example. I need to allow for multiple top level brackets.
Like such:
Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.
Becomes:
Hello there friend.
Upvotes: 13
Views: 1254
Reputation: 18555
With PyPI regex and recursive regex eg like this:
p = r'{{(?>[^}{]+|(?0))*}} ?'
See this demo at regex101 or Python demo at tio.run.
import regex as re
str = re.sub(p, '', str)
fyi: Regular expression to match balanced parentheses
Upvotes: 0
Reputation: 5418
This question makes fun. Here is my attempt:
import re
def find_str(string):
flag = 0
for index,item in enumerate(string):
if item == '{':
flag += 1
if item == '}':
flag -= 1
if flag == 0:
yield index
s = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.'
index = list(find_str(s))
l = [s[i] for i in index]
s = ' '.join(l)
re.sub('}\s+','',s)
'H e l l o t h e r e f r i e n d .'
Upvotes: 1
Reputation: 2144
For good measure, yet another solution. It starts by finding and replacing the leftmost innermost braces and works its way outwards, rightwards. Takes care of multiple top level braces.
import re
def remove_braces(s):
pattern = r'\{\{(?:[^{]|\{[^{])*?\}\}'
while re.search(pattern, s):
s = re.sub(pattern, '', s)
return s
Not the most efficient, but short.
>>> remove_braces('Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.')
'Hello there friend.'
Upvotes: 1
Reputation: 13799
This is a regex/generator based solution that works with any number of braces. This problem does not need an actual stack because there is only 1 type (well, pair) of token involved. The level
fills the role that a stack would fill in a more complex parser.
import re
def _parts_outside_braces(text):
level = 0
for part in re.split(r'(\{\{|\}\})', text):
if part == '{{':
level += 1
elif part == '}}':
level = level - 1 if level else 0
elif level == 0:
yield part
x = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there. {{ second set {{ of }} braces }}'
print(''.join(_parts_outside_braces(x)))
More general points... the capture group in the regex is what makes the braces show up in the output of re.split
, otherwise you only get the stuff in between. There's also some support for mismatched braces. For a strict parser, that should raise an exception, as should running off the end of the string with level > 0. For a loose, web-browser style parser, maybe you would want to display those }}
as output...
Upvotes: 4
Reputation: 367
The problem is that you would have to deal with nested structure, which means regular expression may not suffice. However, a simple parser with a memory of depth level may come to rescue - it is very simple to write, just store the depth level into a variable.
I just post a more pythonic way of writing the solution here, which may be a good reference for you.
import re
def rem_bra(inp):
i = 0
lvl = 0
chars = []
while i < len(inp):
if inp[i:i+2] == '{{':
lvl += 1
i += 1
elif inp[i:i+2] == '}}':
lvl -= 1
i += 1
else:
if lvl < 1:
chars.append(inp[i])
i += 1
result = ''.join(chars)
# If you need no more contigious spaces, add this line:
result = re.sub(r'\s\s+', r' ', result)
return result
inp = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."
print(rem_bra(inp))
>>> Hello there.
Upvotes: 1
Reputation: 474201
You can use pyparsing
module here. Solution based on this answer:
from pyparsing import nestedExpr
s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend."
expr = nestedExpr('{{', '}}')
result = expr.parseString("{{" + s + "}}").asList()[0]
print(" ".join(item for item in result if not isinstance(item, list)))
Prints:
Hello there friend.
The following would only work if there is only one top-level pair of braces.
If you want to remove everything inside the double curly braces with the braces themselves:
>>> import re
>>>
>>> s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."
>>> re.sub(r"\{\{.*\}\} ", "", s)
'Hello there.'
\{\{.*\}\}
would match double curly braces followed by any characters any number of times (intentionally left it "greedy") followed by double curly braces and a space.
Upvotes: 4
Reputation: 2946
Try the following code:
import re
s = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there'
m = re.search('(.*?) {.*}(.*)',s)
result = m.group(1) + m.group(2)
print(result)
Upvotes: 1