ZenMN
ZenMN

Reputation: 21

UTF-8 hex counted as single character with mb_substr?

First off - I'm a php novice. I'm trying to limit the length of titles on a wordpress theme using mb_substr but it's returning fewer characters when there are certain symbols within the title such as "'" (apostraphe) or "-" (dash).

Here is the code I'm working with, limiting the characters to 60 in total (ignore the non-ellipsis):

    <?php 
        $short_title = the_title('','',false);
        $short_title_2 = mb_substr($short_title,0,60, 'utf-8');?>
    <h3>
    <a href="<?php the_permalink(); ?>">

            <?php echo $short_title_2; if($short_title_2!=$short_title) { echo "..."; }; ?>
    </a>
</h3>

So basically I wan't this to return the title truncated to 60 characters, but when I have any form of punctuation or other special characters it counts them as a separate 6 characters (must be counting their unicode value or something?) meaning it will actually only return 54 characters.

Here's and example title with dash character:

Competition - Win Tees from Listen To Your Eyes Clothing Now Ended

The code should return:

<h3>Competition - Win Tees from Listen To Your Eyes Clothing Now…</h3>

What it actually returns:

<h3>Competition – Win Tees from Listen To Your Eyes Clothi…</h3>

The database charset is set to utf8_general_ci (including the table for the title)

Is there any way I can overcome this?

Upvotes: 2

Views: 545

Answers (1)

Aurimas Ličkus
Aurimas Ličkus

Reputation: 10074

Decode html entites back to normal

$short_title_2 = mb_substr(html_entity_decode($short_title, ENT_QUOTES),0,60, 'utf-8');

http://php.net/manual/en/function.html-entity-decode.php

Upvotes: 2

Related Questions