user967451
user967451

Reputation:

Truncate text without truncating HTML

This string has 78 characters with HTML and 39 characters without HTML:

<p>I really like the <a href="http://google.com">Google</a> search engine.</p>

I want to truncate this string based on the non-HTML character count, so for example if I wanted to truncate the above string to 24 characters, the output would be:

I really like the <a href="http://google.com">Google</a>

The truncation did not take into account the html when determining the number of characters to cut off, it only considered the stripped count. However, it didn't leave open HTML tags.

Upvotes: 4

Views: 4432

Answers (1)

user967451
user967451

Reputation:

Alright so this is what I put together and it seems to be working:

function truncate_html($string, $length, $postfix = '&hellip;', $isHtml = true) {
    $string = trim($string);
    $postfix = (strlen(strip_tags($string)) > $length) ? $postfix : '';
    $i = 0;
    $tags = []; // change to array() if php version < 5.4

    if($isHtml) {
        preg_match_all('/<[^>]+>([^<]*)/', $string, $tagMatches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
        foreach($tagMatches as $tagMatch) {
            if ($tagMatch[0][1] - $i >= $length) {
                break;
            }

            $tag = substr(strtok($tagMatch[0][0], " \t\n\r\0\x0B>"), 1);
            if ($tag[0] != '/') {
                $tags[] = $tag;
            }
            elseif (end($tags) == substr($tag, 1)) {
                array_pop($tags);
            }

            $i += $tagMatch[1][1] - $tagMatch[0][1];
        }
    }

    return substr($string, 0, $length = min(strlen($string), $length + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . $postfix;
}

Usage:

truncate_html('<p>I really like the <a href="http://google.com">Google</a> search engine.</p>', 24);

The function was grabbed from (made a small modification):

http://www.dzone.com/snippets/truncate-text-preserving-html

Upvotes: 9

Related Questions