Reputation: 808
I am trying to remove the following from my string:
string:
Snowden (left), whose whereabouts remain unknown, made the extraordinary claim as his father, Lon (right), told US television he intended to travel
I am using the following regex: ([(].*[)])
, but it's matching:
(left), whose whereabouts remain unknown, made the extraordinary claim as his father, Lon (right)
Which makes sense, but isn't what I want.
What can I do to solve this? Does it have something to do with greedy or lazy?
EDIT:
I am using Python:
paren = re.findall(ur'([(\u0028][^)\u0029]*[)\u0029])', text, re.UNICODE)
if paren is not None:
text = re.sub(s, '', text)
This leads to the following output:
Snowden (), whose whereabouts remain unknown, made the extraordinary claim as his father, Lon (), told US television he intended to travel
However, when I print paren.group(0) I get "(left)", meaning the parentheses are included, why is this?
Thanks.
Upvotes: 0
Views: 5145
Reputation: 3631
It's a matter of style, but I prefer [(]
to \(
so I would use ([(][^)]*[)])
You haven't specified which language you are using. If it is Perl I would use the /x qualified to allow me to add spacing for clarity
/ ( [(] [^)]* [)] ) /x
Upvotes: 0
Reputation: 35404
As pguardiario mentioned (who I upvoted), you don't need a character class, just escape the parenthesis.
His solution will work, with one caveat: if the text within parenthesis is hard-wrapped, the .
won't capture \n
. You need a character class for that.
My proposed solution:
\([^)]*\)
This escapes the parenthesis on either end, and will always capture whatever is within the parenthesis (unless it contains another parenthetical clause, of course).
Upvotes: 0
Reputation: 55002
Second, use .*? for non-greedy match
/\(.*?\)/
Upvotes: 1
Reputation: 1927
Use the negation: ([(][^)]*[)])
. This will match the opening (
, then any number of characters which are not a closing )
, then the closing )
.
You can negate any character or set of characters in this way. To match a literal ^
caret, you can put it outside the []
character set or put it anywhere after the first character, like so: [a^bc]
. It is always a good idea to read the rules of the regular expression language you are working in to know exactly what is possible and the correct syntax.
Being greedy or lazy is one rule that might not be implemented the same (if at all) in all regular expression implementations. Better to explicitly say what you want to find than to depend on a rule that is difficult to understand and debug (sometimes).
Upvotes: 5
Reputation: 4373
Restrict the .* to match only things that aren't parentheses:
([(][^()]*[)])
Upvotes: 1