Reputation: 85
I need a regex that extracts text between a starting and an ending char (open and close bracket in my example) if and only if such text is made up of a specified number of words.
I'm using this regex (really simple) that works in this case.
re.findall("(?<=\()(.*?)(?=\))", "bla bla (bla bla) bla bla")
actual output: ['bla bla']
But fail in this:
re.findall("(?<=\()(.*?)(?=\))", "bla bla (bla ( bla bla) bla bla")
actual output: ['bla ( bla bla']
desired output: [' bla bla']
I'm wondering if it's possible to extend the (.*?)
parts in order to search by a condition.
Imagine to catch all the text between two brackets if the text between them is composed by two words
re.findall("(?<=\()(.*?)(?=\))", "bla bla (bla ( bla bla) bla bla (bla bla bla) bla")
desired output: [' bla bla']
Can you help me?
Upvotes: 0
Views: 451
Reputation: 163277
If you want to match 2 words that do not contain chars (
and )
between (...)
you can use a capture group to get the inner value:
\((\s*[^\s()]+\s+[^\s()]+\s*)\)
The pattern matches:
\(
Match (
(
Capture group 1
\s*
Match optional leading whitespace chars[^\s()]+
Match 1+ non whitespace chars without (
and )
\s+
Match 1+ whitespace chars between the words[^\s()]+
Match 1+ non whitespace chars without (
and )
\s*
Match optional trailing whitespace chars)
Close group 1\)
Match )
import re
pattern = r"\((\s*[^\s()]+\s+[^\s()]+\s*)\)"
s = ("bla bla (bla bla) bla bla\n"
"bla bla (bla ( bla bla) bla bla\n"
"bla bla (bla ( test5 ) bla bla\n"
"bla bla (bla ( test6 test7 test8 ) bla bla")
print(re.findall(pattern, s))
Output
['bla bla', ' bla bla']
Upvotes: 2