Regular expression to extract a specific value from HTML anchors

Question

I am trying to extract http://xyz.com/5 link from the string below. You can see that only for that one we have the class="next" attribute. So I am trying to get that based on this attribute.

I tried below pattern but this returns all links in the entire text.

(I understand from this site that using regular expressions to parse HTML is a bad idea, but I have to do this for now.)

Barmar · Accepted Answer

Try this regexp:

Making a regular expression non-greedy doesn't mean it will always find the shortest match. It just means that once it has found a match it will return it, it won't keep looking for a longer match. Put another way, it will uses the shortest match at the right-hand end of the wildcard, but not the left-hand side.

So your regular expression was matching at the beginning of the first link, and continuing until it found class = "next". Instead of using .+?, using [^']+ means that the wildcard will not cross attribute boundaries, so you're assured of matching just one link.

Regular expression to extract a specific value from HTML anchors

Answers (2)

Related Questions