Reputation: 485
I am trying to cut off text after 236 chars without cutting words in half and preserving html tags. This is what I am using right now:
$shortdesc = $_helper->productAttribute($_product, $_product->getShortDescription(), 'short_description');
$lenght = 236;
echo substr($shortdesc, 0, strrpos(substr($shortdesc, 0, $lenght), " "));
While this is working in most cases, it won't respect html tags. So for example this text:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong>
will get cut off with the tag still being open. Is there any way to cut off text after 236 chars but respecting html tags?
Upvotes: 18
Views: 18629
Reputation: 1
This will work with Unicode (from @nice ass answer):
class Html
{
protected
$reachedLimit = false,
$totalLen = 0,
$maxLen = 25,
$toRemove = [];
public static function trim($html, $maxLen = 25)
{
$dom = new \DOMDocument();
$dom->loadHTML('<?xml encoding="UTF-8">' . $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$instance = new static();
$toRemove = $instance->walk($dom, $maxLen);
// remove any nodes that exceed limit
foreach ($toRemove as $child) {
$child->parentNode->removeChild($child);
}
return $dom->saveHTML();
}
protected function walk(\DOMNode $node, $maxLen)
{
if ($this->reachedLimit) {
$this->toRemove[] = $node;
} else {
// only text nodes should have text,
// so do the splitting here
if ($node instanceof \DOMText) {
$this->totalLen += $nodeLen = mb_strlen($node->nodeValue);
// use mb_strlen / mb_substr for UTF-8 support
if ($this->totalLen > $maxLen) {
dump($node->nodeValue);
$node->nodeValue = mb_substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
$this->reachedLimit = true;
}
}
// if node has children, walk its child elements
if (isset($node->childNodes)) {
foreach ($node->childNodes as $child) {
$this->walk($child, $maxLen);
}
}
}
return $this->toRemove;
}
}
Upvotes: 0
Reputation: 16709
This should do it:
class Html
{
protected
$reachedLimit = false,
$totalLen = 0,
$maxLen = 25,
$toRemove = array();
public static function trim($html, $maxLen = 25)
{
$dom = new DomDocument();
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
$dom->loadHTML($html);
} else {
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
}
$instance = new static();
$toRemove = $instance->walk($dom, $maxLen);
// remove any nodes that exceed limit
foreach ($toRemove as $child) {
$child->parentNode->removeChild($child);
}
// remove wrapper tags added by DD (doctype, html...)
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
// http://stackoverflow.com/a/6953808/1058140
$dom->removeChild($dom->firstChild);
$dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
return $dom->saveHTML();
}
return $dom->saveHTML();
}
protected function walk(DomNode $node, $maxLen)
{
if ($this->reachedLimit) {
$this->toRemove[] = $node;
} else {
// only text nodes should have text,
// so do the splitting here
if ($node instanceof DomText) {
$this->totalLen += $nodeLen = strlen($node->nodeValue);
// use mb_strlen / mb_substr for UTF-8 support
if ($this->totalLen > $maxLen) {
$node->nodeValue = substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
$this->reachedLimit = true;
}
}
// if node has children, walk its child elements
if (isset($node->childNodes)) {
foreach ($node->childNodes as $child) {
$this->walk($child, $maxLen);
}
}
}
return $this->toRemove;
}
}
Use like: $str = Html::trim($str, 236);
There's very little difference, and at very large string sizes, DomDocument is actually faster. Reliability is more important than saving a few microseconds in my opinion.
Upvotes: 18
Reputation: 841
Here is JS solution: trim-html
The idea is to split HTML string in that way to have an array with elements being html tag(open or closed) or just string.
var arr = html.replace(/</g, "\n<")
.replace(/>/g, ">\n")
.replace(/\n\n/g, "\n")
.replace(/^\n/g, "")
.replace(/\n$/g, "")
.split("\n");
Than we can iterate through array and count characters.
Upvotes: -2
Reputation: 7074
I did in JS, hope this logic will help in PHP too..
splitText : function(content, count){
var originalContent = content;
content = content.substring(0, count);
//If there is no occurance of matches before breaking point and the hit breakes in between html tags.
if (content.lastIndexOf("<") > content.lastIndexOf(">")){
content = content.substring(0, content.lastIndexOf('<'));
count = content.length;
if(originalContent.indexOf("</", count)!=-1){
content += originalContent.substring(count, originalContent.indexOf('>', originalContent.indexOf("</", count))+1);
}else{
content += originalContent.substring(count, originalContent.indexOf('>', count)+1);
}
//If the breaking point is in between tags.
}else if(content.lastIndexOf("<") != content.lastIndexOf("</")){
content = originalContent.substring(0, originalContent.indexOf('>', count)+1);
}
return content;
},
Hope this logic helps some one..
Upvotes: -2
Reputation: 1967
function limitStrlen($input, $length, $ellipses = true, $strip_html = true, $skip_html)
{
// strip tags, if desired
if ($strip_html || !$skip_html)
{
$input = strip_tags($input);
// no need to trim, already shorter than trim length
if (strlen($input) <= $length)
{
return $input;
}
//find last space within length
$last_space = strrpos(substr($input, 0, $length), ' ');
if($last_space !== false)
{
$trimmed_text = substr($input, 0, $last_space);
}
else
{
$trimmed_text = substr($input, 0, $length);
}
}
else
{
if (strlen(strip_tags($input)) <= $length)
{
return $input;
}
$trimmed_text = $input;
$last_space = $length + 1;
while(true)
{
$last_space = strrpos($trimmed_text, ' ');
if($last_space !== false)
{
$trimmed_text = substr($trimmed_text, 0, $last_space);
if (strlen(strip_tags($trimmed_text)) <= $length)
{
break;
}
}
else
{
$trimmed_text = substr($trimmed_text, 0, $length);
break;
}
}
// close unclosed tags.
$doc = new DOMDocument();
$doc->loadHTML($trimmed_text);
$trimmed_text = $doc->saveHTML();
}
// add ellipses (...)
if ($ellipses)
{
$trimmed_text .= '...';
}
return $trimmed_text;
}
$str = "<h1><strong><span>Lorem</span></strong> <i>ipsum</i> <p class='some-class'>dolor</p> sit amet, consetetur.</h1>";
// view the HTML
echo htmlentities(limitStrlen($str, 22, false, false, true), ENT_COMPAT, 'UTF-8');
// view the result
echo limitStrlen($str, 22, false, false, true);
Note: There may be a better way to close tags instead of using DOMDocument
. For example we can use a p tag
inside a h1 tag
and it still will work. But in this case the heading tag will close before the p tag
because theoretically it's not possible to use p tag
inside it. So, be careful for HTML's strict standards.
Upvotes: -1
Reputation: 4127
Best solution I have come across for this is from the CakePHP framework TextHelper class
Here is the method
/**
* Truncates text.
*
* Cuts a string to the length of $length and replaces the last characters
* with the ending if the text is longer than length.
*
* ### Options:
*
* - `ending` Will be used as Ending and appended to the trimmed string
* - `exact` If false, $text will not be cut mid-word
* - `html` If true, HTML tags would be handled correctly
*
* @param string $text String to truncate.
* @param integer $length Length of returned string, including ellipsis.
* @param array $options An array of html attributes and options.
* @return string Trimmed string.
* @access public
* @link http://book.cakephp.org/view/1469/Text#truncate-1625
*/
function truncate($text, $length = 100, $options = array()) {
$default = array(
'ending' => '...', 'exact' => true, 'html' => false
);
$options = array_merge($default, $options);
extract($options);
if ($html) {
if (mb_strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
return $text;
}
$totalLength = mb_strlen(strip_tags($ending));
$openTags = array();
$truncate = '';
preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);
foreach ($tags as $tag) {
if (!preg_match('/img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param/s', $tag[2])) {
if (preg_match('/<[\w]+[^>]*>/s', $tag[0])) {
array_unshift($openTags, $tag[2]);
} else if (preg_match('/<\/([\w]+)[^>]*>/s', $tag[0], $closeTag)) {
$pos = array_search($closeTag[1], $openTags);
if ($pos !== false) {
array_splice($openTags, $pos, 1);
}
}
}
$truncate .= $tag[1];
$contentLength = mb_strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $tag[3]));
if ($contentLength + $totalLength > $length) {
$left = $length - $totalLength;
$entitiesLength = 0;
if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $tag[3], $entities, PREG_OFFSET_CAPTURE)) {
foreach ($entities[0] as $entity) {
if ($entity[1] + 1 - $entitiesLength <= $left) {
$left--;
$entitiesLength += mb_strlen($entity[0]);
} else {
break;
}
}
}
$truncate .= mb_substr($tag[3], 0 , $left + $entitiesLength);
break;
} else {
$truncate .= $tag[3];
$totalLength += $contentLength;
}
if ($totalLength >= $length) {
break;
}
}
} else {
if (mb_strlen($text) <= $length) {
return $text;
} else {
$truncate = mb_substr($text, 0, $length - mb_strlen($ending));
}
}
if (!$exact) {
$spacepos = mb_strrpos($truncate, ' ');
if (isset($spacepos)) {
if ($html) {
$bits = mb_substr($truncate, $spacepos);
preg_match_all('/<\/([a-z]+)>/', $bits, $droppedTags, PREG_SET_ORDER);
if (!empty($droppedTags)) {
foreach ($droppedTags as $closingTag) {
if (!in_array($closingTag[1], $openTags)) {
array_unshift($openTags, $closingTag[1]);
}
}
}
}
$truncate = mb_substr($truncate, 0, $spacepos);
}
}
$truncate .= $ending;
if ($html) {
foreach ($openTags as $tag) {
$truncate .= '</'.$tag.'>';
}
}
return $truncate;
}
Other frameworks may have similar (or different) solutions to this problem, so you could take a look at them too. My familiarity with Cake is what prompted my linking to their solution
Edit:
Just tested this method in an app I'm working on with the OP's text
<?php
echo truncate(
'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong>',
236,
array('html' => true, 'ending' => ''));
?>
Output:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubegre</strong>
Notice the output stops just short of completing the last word, but includes the complete strong tags
Upvotes: 19
Reputation: 753
Can I just give a thought ?
Sample text :
Lorem ipsum dolor sit amet, <i class="red">magna aliquyam erat</i>, duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong> hello
First, parse it into:
array(
'0' => array(
'tag' => '',
'text' => 'Lorem ipsum dolor sit amet, '
),
'1' => array(
'tag' => '<i class="red">',
'text' => 'magna aliquyam erat',
)
'2' => ......
'3' => ......
)
then cut the text one by one, and wrap each one with its tag after cut,
then join them.
Upvotes: 1