Ryan Peschel
Ryan Peschel

Reputation: 11756

Regular expression a little too greedy

I am trying to capture a small subset of a document with this RegEx:

preg_match('/href="(.+?)".+?>Keyword/s', $a, $b);

However, instead of just grabbing the href= immediately before the Keyword, it starts at the first href= in the document and goes alllllll the way down to the Keyword.

How can I make it so it backtracks and only keeps the href= immediately before Keyword?

Upvotes: 0

Views: 106

Answers (2)

nhahtdh
nhahtdh

Reputation: 56809

If in the input, the text surrounded by the anchor tag is on the same line as the tag, you can remove the s flag.

Otherwise, you need a more specific regex:

'/href="[^"]*"[^<>]*>Keyword/'

This assumes that the link inside href does not contain ". The [^<>] prevents other tags from being part of the match.

Upvotes: 2

orique
orique

Reputation: 1303

Assuming no " can be inside the href attribute, you could start tuning your regex with this:

preg_match('/href="([^"]+?)".+?>Keyword/s', $a, $b);

Upvotes: 0

Related Questions