Reputation: 707
I am working on some blog layout and I need to create an abstract of each post (say 15 of the lastest) to show on the homepage. Now the content I use is already formatted in html tags by the textile library. Now if I use substr to get 1st 500 chars of the post, the main problem that I face is how to close the unclosed tags.
e.g
<div>.......................</div>
<div>...........
<p>............</p>
<p>...........| 500 chars
</p>
<div>
What I get is two unclosed tags <p> and <div> , p wont create much trouble , but div just messes with the whole page layout. So any suggestion how to track the opening tags and close them manually or something?
Upvotes: 15
Views: 27237
Reputation: 2100
I found a solution which uses DOMDocument but does not add extra tags to your strings; just fixes malformed HTML. See answer here: https://stackoverflow.com/a/79081559/492132
Original github (not mine) here: https://gist.github.com/hubgit/1322324
Upvotes: 0
Reputation: 928
You can use DOMDocument to do it, but be careful of string encoding issues. Also, you'll have to use a complete HTML document, then extract the components you want. Here's an example:
function make_excerpt ($rawHtml, $length = 500) {
// append an ellipsis and "More" link
$content = substr($rawHtml, 0, $length)
. '… <a href="/link-to-somewhere">More ></a>';
// Detect the string encoding
$encoding = mb_detect_encoding($content);
// pass it to the DOMDocument constructor
$doc = new DOMDocument('', $encoding);
// Must include the content-type/charset meta tag with $encoding
// Bad HTML will trigger warnings, suppress those
@$doc->loadHTML('<html><head>'
. '<meta http-equiv="content-type" content="text/html; charset='
. $encoding . '"></head><body>' . trim($content) . '</body></html>');
// extract the components we want
$nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
$html = '';
$len = $nodes->length;
for ($i = 0; $i < $len; $i++) {
$html .= $doc->saveHTML($nodes->item($i));
}
return $html;
}
$html = "<p>.......................</p>
<p>...........
<p>............</p>
<p>...........| 500 chars";
// output fixed html
echo make_excerpt($html, 500);
Outputs:
<p>.......................</p>
<p>...........
</p>
<p>............</p>
<p>...........| 500 chars… <a href="/link-to-somewhere">More ></a></p>
If you are using WordPress you should wrap the substr()
invocation in a call to wpautop
- wpautop(substr(...))
. You may also wish to test the length of the $rawHtml passed to the function, and skip appending the "More" link if it isn't long enough.
Upvotes: 3
Reputation: 1238
As ajreal said, DOMDocument is a solution.
Example :
$str = "
<html>
<head>
<title>test</title>
</head>
<body>
<p>error</i>
</body>
</html>
";
$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();
Advantage : natively included in PHP, contrary to PHP Tidy.
Upvotes: 19
Reputation: 47321
There are lots of methods that can be used:
Upvotes: 18