maidan
maidan

Reputation: 219

Splitting HTML into two parts

I am splitting an HTML text into two parts. When doing that breaking HTML can happen. I had a fixing function that lookes like this:

$html_intro = '<h3>Title</h3><p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</p>';
$doc = new DOMDocument();
$doc->substituteEntities = false;
$content = mb_convert_encoding($html_intro, 'html-entities', 'utf-8');
@$doc->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$html_intro = $doc->saveHTML();

Since i added the LIBXML_HTML_NOIMPLIED parameters to not add body elements the repairing is no longer working.

This is what i get when doing var_dump($html_intro):

string(613) "<h3>Title<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</p></h3>"

You can see a simple example here: https://onlinephp.io/c/d4ba5

Why is the <h3> Tag getting broken like that?

It can be fixed by doing this again:

$html_intro = tidy_repair_string($html_intro, array('show-body-only' => true));

But that looks all very strange.

Upvotes: 1

Views: 46

Answers (0)

Related Questions