JAyenGreen
JAyenGreen

Reputation: 1435

preg_match_all has different result set than preg_replace using the same pattern

I find that preg_match_all and preg_replace do not find the same matches based on the same pattern.

My pattern is:

/<(title|h1|h2|h3|h4|h5|ul|ol|p|figure|caption|span)(.*?)><\/(\1)>/

When I run this against a snippet containing the likes of

<span class="blue"></span> 

with preg_match_all I get 17 matches.

When I use the same pattern in preg_replace I get 0 matches. Replacing the \1 with the selection list does find the matches, but of course that won't work as a solution because it then doesn't ensure that the closing tag is the same type of the opening tag.

The overall goal is to find instances of tags with no content that should not be present without content...a holy crusade, I assure you.

In testing whether the regex works, I have also tried it in php cli. Here is the output:

Interactive shell

php > $str = 'abc<span class="blue"></span>def';
php > $pattern = "/<(title|h1|h2|h3|h4|h5|ul|ol|p|figure|caption|span)(.*?)><\/(\1)>/";
php > $final = preg_replace($pattern, '', $str);
php > print $final;
abc<span class="blue"></span>def

Upvotes: 0

Views: 193

Answers (1)

Jakumi
Jakumi

Reputation: 8374

$str = 'abc<span class="blue"></span>def';
$pattern = "/<(title|h1|h2|h3|h4|h5|ul|ol|p|figure|caption|span)(.*?)><\/(\\1)>/";
                                                              // added \  ^
$final = preg_replace($pattern, '', $str);
print $final;
// echos 'abcdef'

explanation:

"\1" // <-- character in octal notation

is very different from

'\1' // <-- backslash and 1

because the first is an escape sequence. this is also the reason I almost exclusively use single quoted strings. see http://php.net/string#language.types.string.syntax.double

Upvotes: 1

Related Questions