SimoneS93
SimoneS93

Reputation: 115

PHP - Inner HTML recursive replace

I need to perform a recursive str_replace on a portion of HTML (with recursive I mean inner nodes first), so I wrote:

$str = //get HTML;
$pttOpen = '(\w+) *([^<]{1,100}?)';
$pttClose = '\w+';
$pttHtml = '(?:(?!(?:<x-)).+)';

while (preg_match("%<x-(?:$pttOpen)>($pttHtml)*</x-($pttClose)>%m", $str, $match)) {
    list($outerHtml, $open, $attributes, $innerHtml, $close) = $match;
    $newHtml = //some work....
    str_replace($outerHtml, $newHtml, $str);
}

The idea is to first replace non-nested x-tags. But it only works if innerHtml in on the same line of the opening tag (so I guess I misunderstood what the /m modifier does). I don't want to use a DOM library, because I just need simple string replacement. Any help?

Upvotes: 0

Views: 341

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

I don't know exactly what kind of changes you are trying to do, however this is the way I will proceed:

$pattern = <<<'EOD'
~
    <x-(?<tagName>\w++) (?<attributes>[^>]*+) >
    (?<content>(?>[^<]++|<(?!/?x-))*) #by far more efficient than (?:(?!</?x-).)*
    </x-\g<tagName>>
~x
EOD;

function callback($m) { // exemple function
    return '<n-' . $m['tagName'] . $m['attributes'] . '>' . $m['content']
         . '</n-' . $m['tagName'] . '>';       
};

do {
    $code = preg_replace_callback($pattern, 'callback', $code, -1, $count);
} while ($count);


echo htmlspecialchars(print_r($code, true));

Upvotes: 1

SimoneS93
SimoneS93

Reputation: 115

Thanks to @Alex I came up with this:

%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*?)>(?P<innerHtml>((?!<x-).)*)</x-(?P=open)>%is

Without the ((?!<x-).)*) in the innerHtml pattern it won't work with nested tags (it will first match outer ones, which isn't what I wanted). This way innermost ones are matched first. Hope this helps.

Upvotes: 1

Stephan
Stephan

Reputation: 43033

Try this regex:

%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*)>(?P<innerHtml>.*)</x-(?P=open)>%s

Demo

http://regex101.com/r/nA2zO5

Sample code

$str = // get HTML
$pattern = '%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*)>(?P<innerHtml>.*)</x-(?P=open)>%s';

while (preg_match($pattern, $str, $matches)) {
    $newHtml =  sprintf('<ns:%1$s>%2$s</ns:%1$s>', $matches['open'], $matches['innerHtml']);
    $str = str_replace($matches[0], $newHtml, $str);
}

echo htmlspecialchars($str);

Output

Initially, $str contained this text:

<x-foo>
    sdfgsdfgsd
       <x-bar>
           sdfgsdfg
       </x-bar>
       <x-baz attr1='5'>
           sdfgsdfg
       </x-baz>
    sdfgsdfgs
</x-foo>

It ends up with:

<ns:foo>
   sdfgsdfgsd
   <ns:bar>
       sdfgsdfg
   </ns:bar>
   <ns:baz>
       sdfgsdfg
   </ns:baz>
   sdfgsdfgs
</ns:foo>

Since, I didn't know what work is done on $newHtml, I mimic this work somehow by replacing x-with ns: and removing any attributes.

Upvotes: 1

Related Questions