Frank
Frank

Reputation: 21

PHP - Preg match reversal?

How do you inverse a Regex expression in PHP?

This is my code:

preg_match("!<div class=\"foo\">.*?</div>!is", $source, $matches);

This is checking the $source String for everything within the Container and stores it in the $matches variable.

But what I want to do is reversing the expression i.e. I want to get everything that is NOT inside the container. I know there is something called negative lookahead, but I am really bad with Regular expressions and didn't manage to come up with a working solution.

Simply using ?!

preg_match("?!<div class=\"foo\">.*?</div>!is", $source, $matches);

Does not seem to work.

Thanks!

Upvotes: 1

Views: 241

Answers (2)

nhahtdh
nhahtdh

Reputation: 56809

New solution

Since your goal is to remove the matching divs, as mentioned in the comment, using the original regex with preg_split, plus implode would be the simpler solution:

implode('', preg_split('~<div class="foo">.*?</div>~is', $text))

Demo on ideone

Old solution

I'm not sure whether this is a good idea, but here is my solution:

~(.*?)(?:<div class="foo">.*?</div>|$)~is

Demo on regex101

The result can be picked out from capturing group 1 of each matches.

Note that the last match is always an empty string, and there can be empty string match between 2 matching divs or if the string starts with matching div. However, you need to concatenate them anyway, so it seems to be a non-issue.

The idea is to rely on the fact that lazy quantifier .*? will always try the sequel (whatever comes after it) first before advancing itself, resulting in something similar to look-ahead assertion that makes sure that whatever matched by .*? will not be inside <div class="foo">.*?</div>.

The div tag is matched along in each match in order to advance the cursor past the closing tag. $ is used to match the text after the last matching div.

The s flag makes . matches any character, including line separators.

Revision: I had to change .+? to .*?, since .+? handle strings with 2 matching div next to each other and strings start with matching div.


Anyway, it's not a good idea to modify HTML with regular expression. Use a parser instead.

Upvotes: 1

vks
vks

Reputation: 67968

<div class=\"foo\">.*?</div>\K|.

You can simply do this by using \K.

\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match

Upvotes: 0

Related Questions