user2992699
user2992699

Reputation: 21

PHP Regex Match of Anchor Tag

I am trying to use REGEX to split a string apart while maintaining the delimeters. I wish to split a very large and unpredictable string apart via anchor tags. I am using HTML tidy to ensure the tags are correct, however anything could come before or after the anchor tag I wish to match.

*PRECEDING-ANYTHING*<a *ANYTHING*>*ANYTHING*</a>*PROCEDING-ANYTHING*
*PRECEDING-ANYTHING*<a *ANYTHING*>*ANYTHING*</a>*PROCEDING-ANYTHING*

where the href URL could be anything and additional attributes such as 'target' could also be anything.

I've done a lot of searching and testing and either I am doing something wrong or the other answers on Stack Overflow do not apply.

Using

$parts= preg_split($pattern, $textWithAnchors, -1, PREG_SPLIT_DELIM_CAPTURE) 

I was hoping to have $parts be similar to the following.

parts[0] is equal to *PRECEDING-ANYTHING*
parts[1] is equal to <a *ANYTHING*>*ANYTHING*</a>
and so forth

It is important that the regular expression capture the entire anchor tags and everything inside.

I would very much appreciate any help, I'm asking specifically for a regular expression that will accomplish this in PHP. I am aware that there are HTML parsers however, using REGEX is optimal in this situation. Maybe it will be a learning experiance though.

Upvotes: 2

Views: 2232

Answers (1)

Bohemian
Bohemian

Reputation: 425033

Using PREG_SPLIT_DELIM_CAPTURE won't help you, because that returns text captured in group 1 of the delimiter regex as a separate element, but you want the delimiters to be included with the elements.

To specify delimiters that don't consume input, use regex look arounds.
This code does the job:

$parts= preg_split('/(?=<a)|(?<=\/a>)/', $textWithAnchors);

It's splitting using a look-ahead for the open tag, an da look behind for the closing tag.

See a live demo of this code splitting your example as required.

Upvotes: 1

Related Questions