John
John

Reputation: 31

Can't find upper case letter in URL using Regex

I have the following regex:

(href[\s]?=[\s]?)(\"[^"]*\/*[^"]*\")

using the following Test String:

href="http://mysite.io/Plan-documents"

I get two capturing groups. One with the href= and the other is everything past that. Now I want to only display matches where there is an uppercase letter anywhere in the second capture group. I tried:

(href[\s]?=[\s]?)(\"[A-Z]*[^"]*\/*[^"]*\")

to try and only have this regex come back with URL's that have uppercase in them. No luck. Regardless if I modify the test string as:

 href="http://mysite.io/plan-documents"

I still get a match. I only want to match on the href string if there any at least one uppercase in the string past the href=.

Thanks.

Upvotes: 0

Views: 383

Answers (1)

The fourth bird
The fourth bird

Reputation: 163457

You don't get the right matches because in your second capturing group all what is between double quotes uses a quantifier * which matches 0 or more times.

First the engine matches 0+ times [A-Z]*. It is not present but it is ok, because of the 0+ times quantifier. Then the next part [^"]* will match until right before it encounters the next "

The following \/* is not there but is also ok because of the 0+ times quantifier followed by [^"]* which is also ok.

What you might do instead is first match not an uppercase until you match an uppercase and then match until the closing double quotes.

(href\s?=\s?)("[^A-Z\s]*[A-Z][^\s"]*")

Explanation

  • (href\s?=\s?) Capture group, match href= surrounded by optional whitespace char
  • (" Start capture group and match "
    • [^A-Z\s]* Match 0+ times not an uppercase or whitespace char
    • [A-Z] Match 1 uppercase char
    • [^"\s]* Match 0+ times not " or a whitespace char
  • ") Match " and close capture group

Regex demo

Without using groups, you could use:

href\s?=\s?"[^A-Z\s]*[A-Z][^\s"]*"

Regex demo

Upvotes: 3

Related Questions