Reputation: 21
I am trying to use REGEX to split a string apart while maintaining the delimeters. I wish to split a very large and unpredictable string apart via anchor tags. I am using HTML tidy to ensure the tags are correct, however anything could come before or after the anchor tag I wish to match.
*PRECEDING-ANYTHING*<a *ANYTHING*>*ANYTHING*</a>*PROCEDING-ANYTHING*
*PRECEDING-ANYTHING*<a *ANYTHING*>*ANYTHING*</a>*PROCEDING-ANYTHING*
where the href URL could be anything and additional attributes such as 'target' could also be anything.
I've done a lot of searching and testing and either I am doing something wrong or the other answers on Stack Overflow do not apply.
Using
$parts= preg_split($pattern, $textWithAnchors, -1, PREG_SPLIT_DELIM_CAPTURE)
I was hoping to have $parts be similar to the following.
parts[0] is equal to *PRECEDING-ANYTHING*
parts[1] is equal to <a *ANYTHING*>*ANYTHING*</a>
and so forth
It is important that the regular expression capture the entire anchor tags and everything inside.
I would very much appreciate any help, I'm asking specifically for a regular expression that will accomplish this in PHP. I am aware that there are HTML parsers however, using REGEX is optimal in this situation. Maybe it will be a learning experiance though.
Upvotes: 2
Views: 2232
Reputation: 425033
Using PREG_SPLIT_DELIM_CAPTURE
won't help you, because that returns text captured in group 1 of the delimiter regex as a separate element, but you want the delimiters to be included with the elements.
To specify delimiters that don't consume input, use regex look arounds.
This code does the job:
$parts= preg_split('/(?=<a)|(?<=\/a>)/', $textWithAnchors);
It's splitting using a look-ahead for the open tag, an da look behind for the closing tag.
See a live demo of this code splitting your example as required.
Upvotes: 1