Reputation: 31
I have the following regex:
(href[\s]?=[\s]?)(\"[^"]*\/*[^"]*\")
using the following Test String:
href="http://mysite.io/Plan-documents"
I get two capturing groups. One with the href= and the other is everything past that. Now I want to only display matches where there is an uppercase letter anywhere in the second capture group. I tried:
(href[\s]?=[\s]?)(\"[A-Z]*[^"]*\/*[^"]*\")
to try and only have this regex come back with URL's that have uppercase in them. No luck. Regardless if I modify the test string as:
href="http://mysite.io/plan-documents"
I still get a match. I only want to match on the href string if there any at least one uppercase in the string past the href=.
Thanks.
Upvotes: 0
Views: 383
Reputation: 163457
You don't get the right matches because in your second capturing group all what is between double quotes uses a quantifier *
which matches 0 or more times.
First the engine matches 0+ times [A-Z]*
. It is not present but it is ok, because of the 0+ times quantifier. Then the next part [^"]*
will match until right before it encounters the next "
The following \/*
is not there but is also ok because of the 0+ times quantifier followed by [^"]*
which is also ok.
What you might do instead is first match not an uppercase until you match an uppercase and then match until the closing double quotes.
(href\s?=\s?)("[^A-Z\s]*[A-Z][^\s"]*")
Explanation
(href\s?=\s?)
Capture group, match href= surrounded by optional whitespace char("
Start capture group and match "
[^A-Z\s]*
Match 0+ times not an uppercase or whitespace char[A-Z]
Match 1 uppercase char[^"\s]*
Match 0+ times not "
or a whitespace char")
Match "
and close capture groupWithout using groups, you could use:
href\s?=\s?"[^A-Z\s]*[A-Z][^\s"]*"
Upvotes: 3