Reputation: 19379
The following expression works well extracting the portion of data
string that starts with the word Block
followed by open bracket {
and ending with the closing bracket '}':
data ="""
Somewhere over the rainbow
Way up high
Block {
line 1
line 2
line 3
}
And the dreams that you dreamed of
Once in a lullaby
"""
regex = re.compile("""(Block\ {\n\ [^\{\}]*\n}\n)""", re.MULTILINE)
result = regex.findall(data)
print result
which returns:
['Block {\n line 1\n line 2\n line 3\n}\n']
But if there is another curly bracket inside of the Block portion of the string the expression breaks returning an empty list:
data ="""
Somewhere over the rainbow
Way up high
Block {
line 1
line 2
{{}
line 3
}
And the dreams that you dreamed of
Once in a lullaby
Block {
line 4
line 5
{{
}
line 6
}
Somewhere over the rainbow
Blue birds fly
And the dreams that you dreamed of
Dreams really do come true ooh oh
"""
How to modify this regex expression to make it ignore the brackets that are inside of the Blocks and yet each block is returned as the separate entity in result
list (so each Block could be accessed separately)?
Upvotes: 0
Views: 58
Reputation: 85
I would suggest you to use:
(Block ?{\n ?[^$]+?\n}\n)
Since python matches greedy, we use ? to be non-greedy.
Worked well for me. In addition I would recommend you the use of https://regex101.com/
Best Regards
Upvotes: 0
Reputation: 17964
Wouldn't this work?
regex = re.compile("""(Block\ {\n\ [^\}]*\n}\n)""", re.MULTILINE)
In the version you've posted, it is exiting the match whenever it comes across a second opening brace, even though you want it to exit upon the first closing brace. If you want nested opening / closing braces that's another story.
Upvotes: 1