Reputation: 1
Okay, I have read many similar questions and tried them out but it's not working for some reason. I have a file with a bunch of lines that look like this:
Here are some words:
"<Hello> (silly girl) that isn't what she want(s)"
I am trying to search for text of two or more characters within parentheses. Many combinations of re.search
and group()
returns something, but not exactly what I'm looking for. The value I want returned and printed in this case is: "silly girl"
.
Right now I have this:
regex = re.compile("\((.+.+)\)")
for line in lines:
m = re.search(regex, line)
if m:
print(m.group())
The above prints:
(silly girl) that isn't what she want(s)
If I change the group index to 1
, as in print(m.group(1))
, it prints the same thing just without the first parentheses:
silly girl) that isn't what she want(s)
What am I doing wrong?
Upvotes: 0
Views: 244
Reputation: 121987
Regular expressions are greedy by default, so capture from the first '('
(before 'silly'
) to the last (after 'want(s'
). Instead:
'?'
;'[^()]'
rather than '.'
to exclude parentheses from the match (thanks to @thg435, and see their comment on the question for a potential drawback);'{2,}'
to indicate "two or more", rather than two separate "one or more" '+'
s; and Now you have:
regex = re.compile(r"\(([^()]{2,}?)\)")
This lets you switch to findall
to get a list of results:
>>> import re
>>> regex = re.compile(r"\(([^()]{2,}?)\)")
>>> s = "<Hello> (silly girl) that isn't what she want(s)"
>>> m = re.findall(regex, s)
>>> m
['silly girl']
See a demo of the regex here.
Upvotes: 3