user3714458
user3714458

Reputation: 1

Python Searching and returning text inside parentheses

Okay, I have read many similar questions and tried them out but it's not working for some reason. I have a file with a bunch of lines that look like this:

Here are some words:

"<Hello> (silly girl) that isn't what she want(s)"

I am trying to search for text of two or more characters within parentheses. Many combinations of re.search and group() returns something, but not exactly what I'm looking for. The value I want returned and printed in this case is: "silly girl".

Right now I have this:

regex = re.compile("\((.+.+)\)")
for line in lines:
   m = re.search(regex, line)  
   if m:
      print(m.group())

The above prints:

(silly girl) that isn't what she want(s)

If I change the group index to 1, as in print(m.group(1)), it prints the same thing just without the first parentheses:

silly girl) that isn't what she want(s)

What am I doing wrong?

Upvotes: 0

Views: 244

Answers (1)

jonrsharpe
jonrsharpe

Reputation: 121987

Regular expressions are greedy by default, so capture from the first '(' (before 'silly') to the last (after 'want(s'). Instead:

  • Make it a lazy match with '?';
  • Use '[^()]' rather than '.' to exclude parentheses from the match (thanks to @thg435, and see their comment on the question for a potential drawback);
  • Use '{2,}' to indicate "two or more", rather than two separate "one or more" '+'s; and
  • Include a capturing group to exclude the parentheses themselves.

Now you have:

regex = re.compile(r"\(([^()]{2,}?)\)")

This lets you switch to findall to get a list of results:

>>> import re
>>> regex = re.compile(r"\(([^()]{2,}?)\)")
>>> s = "<Hello> (silly girl) that isn't what she want(s)"
>>> m = re.findall(regex, s)
>>> m
['silly girl']

See a demo of the regex here.

Upvotes: 3

Related Questions