Reputation:
I have a large chunk of HTML.
With this:
~<div>(?:.*?)<a[\s]+[^>]*?href[\s]?=[\s"\']+(#_ftnref([0-9]+))["\']+.*?>(?:[^<]+|.*?)?</a>(.*?)</div>~si
I am capturing this:
<div> </div><hr align="left" size="1" width="33%" /><div><p><a title="" href="#_ftnref1">[1]</a> This is not to suggest that there are only two possible arguments to be made in support of blah blah <em>blah</em>.</p></div>
But! I want this:
<div><p><a title="" href="#_ftnref1">[1]</a> This is not to suggest that there are only two possible arguments to be made in support of blah blah <em>blah</em>.</p></div>
Can you help?
PS: (?: )
, in contrast to ( )
, is used to avoid capturing text. I'm doing that on purpose because I want the returned $matches array to be consistent for several different regex not mentioned in this post.
Upvotes: 0
Views: 83
Reputation: 145502
If lazy matching with .*?
doesn't work, you need to come up with some exclusion pattern.
(?:(?!</div>).)*
Would for instance only match one div
and stop/skip after any contained </div>
Alternatively a length constraint could be a workaround:
(?:.{0,20})
Upvotes: 1