miciobauciao
miciobauciao

Reputation: 85

python regex to catch text between innermost brackets

I need a regex that extracts text between a starting and an ending char (open and close bracket in my example) if and only if such text is made up of a specified number of words.

I'm using this regex (really simple) that works in this case.

 re.findall("(?<=\()(.*?)(?=\))", "bla bla (bla bla) bla bla")
 actual output: ['bla bla']

But fail in this:

re.findall("(?<=\()(.*?)(?=\))", "bla bla (bla ( bla bla) bla bla")
 
actual output: ['bla ( bla bla']
desired output: [' bla bla']

I'm wondering if it's possible to extend the (.*?) parts in order to search by a condition. Imagine to catch all the text between two brackets if the text between them is composed by two words

re.findall("(?<=\()(.*?)(?=\))", "bla bla (bla ( bla bla) bla bla (bla bla bla) bla")
desired output: [' bla bla']

Can you help me?

Upvotes: 0

Views: 451

Answers (1)

The fourth bird
The fourth bird

Reputation: 163277

If you want to match 2 words that do not contain chars ( and ) between (...) you can use a capture group to get the inner value:

\((\s*[^\s()]+\s+[^\s()]+\s*)\)

The pattern matches:

  • \( Match (
  • ( Capture group 1
    • \s* Match optional leading whitespace chars
    • [^\s()]+ Match 1+ non whitespace chars without ( and )
    • \s+ Match 1+ whitespace chars between the words
    • [^\s()]+ Match 1+ non whitespace chars without ( and )
    • \s* Match optional trailing whitespace chars
  • ) Close group 1
  • \) Match )

Regex demo

import re

pattern = r"\((\s*[^\s()]+\s+[^\s()]+\s*)\)"

s = ("bla bla (bla bla) bla bla\n"
            "bla bla (bla ( bla bla) bla bla\n"
            "bla bla (bla ( test5 ) bla bla\n"
            "bla bla (bla ( test6 test7 test8 ) bla bla")

print(re.findall(pattern, s))

Output

['bla bla', ' bla bla']

Upvotes: 2

Related Questions