Kokos
Kokos

Reputation: 9121

Properly substr a string that contains 'a' elements with PHP

I am currently writing a class that fetches Facebook and Twitter feeds, and then combines them into one for showing on the website.

However I'm running into one problem with limiting the output of whatever entered text because of a elements that will end up not being closed after a simple substr function.

So imagine I have this string:

'Check out our site at <a href="http://site.com/">site.com</a>'

And I want to limit this to 50 characters. If I simply do substr($input,0,50) I will end up with the following:

'Check out our site at <a href="http://site.com/">s'

An unclosed a element which will turn the rest of my website into a link.

I figured maybe using the DOMDocument I could temporarily replace the full url with just the parts between <a></a>, do the substraction and then re-apply the link.

However I can't figure out how to do this, and it leaves me with another problem / choice: what if - given that I am able to temporarily replace the link - after the substraction I end up with half the link:

'Check out our site at sit'

Then it would be hard to re-apply the link, so it's probably better to replace it with something like [[id]] and just have the script remember how long the text was.

Anyway, is there anyone that can help me on my way with this?

EDIT it only applies to a tags since I strip_tags on everything else.

Upvotes: 0

Views: 528

Answers (3)

Kokos
Kokos

Reputation: 9121

I wrote my own function in the end, maybe could use some improvements but it works:

private function substr_html($input,$limit){

    $original = $input;

    if(strlen($input) <= $limit)
        return $input;

    $pattern = '#<a\s+.*?href=[\'"]([^\'"]+)[\'"]\s*?.*?>((?:(?!</a>).)*)</a>#i';   

    // Match all 'a' elements
    preg_match_all($pattern,$input,$matches);

    // If no links were found, perform a simple substr()
    if(count($matches[0]) == 0)
        return substr($input,0,$limit).'...';

    $uni     = sha1(uniqid());      

    preg_replace($pattern,$uni,$input);

    $input  = explode($uni,$input);
    $tmp    = $output = '';

    // Go through the splitted input        
    foreach($input as $i){

        if(strlen($tmp.$i) < $limit){

            // If we can fit the next text value without reaching the limit, do it  
            $tmp    .= $i;
            $output .= $i;

        }else{

            // Add whatever we can fit from the last text value and break the loop
            $diff    = abs($limit - strlen($tmp));
            $output .= substr($i,0,$diff);
            break;

        }

        if(strlen($tmp) < $limit){ // Do we still have room before we reach the limit?

            $nextlink = array_shift($matches[1]);
            $nexttext = array_shift($matches[2]);

            if(strip_tags($nexttext,$this->allowed_tags) != '')
                if(strlen($tmp.$nexttext) < $limit){        

                    // Add the next link if it fits
                    $tmp    .= $nexttext;
                    $output .= '<a href="'.$nextlink.'" target="_blank">'.$nexttext.'</a>';

                }else{

                    // Add whatever we can fit from the last link and break the loop
                    $diff    = abs($limit - strlen($tmp));
                    $output .= '<a href="'.$nextlink.'" target="_blank">'.substr($nexttext,0,$diff).'</a>';
                    break;

                }

        }

    }

    // Trim string and remove linebreaks
    $output = trim(preg_replace('/((<br>|<br\/>|<br \/>){1,})/'," ",$output));

    return $output.(strip_tags($original) != strip_tags($output) ? '...' : '');

}

Upvotes: 0

Audrey Delany
Audrey Delany

Reputation: 113

Another solution would be the strip_tags()-Function of php like this:

<?php
$text = '<p>Check out our site at </p><!-- other html stuff anywhere--> <a href="http://site.com/">site.com</a>';
echo strip_tags($text);
echo "\n";

// juts allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

Upvotes: 0

alexn
alexn

Reputation: 58962

This snippet from php.net/substr works great for this.

Example:

echo substrws("Check out our site at <a href=\"http://site.com/\">site.com</a>. It's really <strong>nice</strong>", 50);

Yields:

Check out our site at <a href="http://site.com/">site.com</a>.

Code:

/**
* word-sensitive substring function with html tags awareness
* @param text The text to cut
* @param len The maximum length of the cut string
* @returns string
**/
function substrws( $text, $len=180 ) {

    if( (strlen($text) > $len) ) {

        $whitespaceposition = strpos($text," ",$len)-1;

        if( $whitespaceposition > 0 )
            $text = substr($text, 0, ($whitespaceposition+1));

        // close unclosed html tags
        if( preg_match_all("|<([a-zA-Z]+)>|",$text,$aBuffer) ) {

            if( !empty($aBuffer[1]) ) {

                preg_match_all("|</([a-zA-Z]+)>|",$text,$aBuffer2);

                if( count($aBuffer[1]) != count($aBuffer2[1]) ) {

                    foreach( $aBuffer[1] as $index => $tag ) {

                        if( empty($aBuffer2[1][$index]) || $aBuffer2[1][$index] != $tag)
                            $text .= '</'.$tag.'>';
                    }
                }
            }
        }
    }

    return $text;
}

Upvotes: 1

Related Questions