alphanumeric
alphanumeric

Reputation: 19379

How to use REGEX with multiline

The following expression works well extracting the portion of data string that starts with the word Block followed by open bracket { and ending with the closing bracket '}':

data ="""
Somewhere over the rainbow
Way up high 
Block {
 line 1
 line 2
 line 3
}
And the dreams that you dreamed of
Once in a lullaby
"""
regex = re.compile("""(Block\ {\n\ [^\{\}]*\n}\n)""", re.MULTILINE)
result = regex.findall(data)
print result 

which returns:

['Block {\n line 1\n line 2\n line 3\n}\n']

But if there is another curly bracket inside of the Block portion of the string the expression breaks returning an empty list:

data ="""
Somewhere over the rainbow
Way up high 
Block {
 line 1
 line 2
 {{}
 line 3
}
And the dreams that you dreamed of
Once in a lullaby
Block {
 line 4
 line 5
 {{
 }
 line 6
}
Somewhere over the rainbow
Blue birds fly
And the dreams that you dreamed of
Dreams really do come true ooh oh
"""

How to modify this regex expression to make it ignore the brackets that are inside of the Blocks and yet each block is returned as the separate entity in result list (so each Block could be accessed separately)?

Upvotes: 0

Views: 58

Answers (2)

1574ad6
1574ad6

Reputation: 85

I would suggest you to use:

(Block ?{\n ?[^$]+?\n}\n)

Since python matches greedy, we use ? to be non-greedy.

Worked well for me. In addition I would recommend you the use of https://regex101.com/

Best Regards

Upvotes: 0

cchamberlain
cchamberlain

Reputation: 17964

Wouldn't this work?

regex = re.compile("""(Block\ {\n\ [^\}]*\n}\n)""", re.MULTILINE)

In the version you've posted, it is exiting the match whenever it comes across a second opening brace, even though you want it to exit upon the first closing brace. If you want nested opening / closing braces that's another story.

Upvotes: 1

Related Questions