Stoikidis
Stoikidis

Reputation: 265

PHP function substr() error

When I use substr() I get a strange character at the end

$articleText = substr($articleText,0,500);

I have an output of 500 chars and � <--

How can I fix this? Is it an encoding problem? My language is Greek.

Upvotes: 26

Views: 23817

Answers (7)

Moussawi7
Moussawi7

Reputation: 13267

use this function, It worked for me

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

Credits: http://php.net/manual/en/function.mb-substr.php#107698

Upvotes: 1

GowriShankar
GowriShankar

Reputation: 1654

You are trying to cut unicode character.So i preferred instead of substr() try mb_substr() in php.

substr()

substr ( string $string , int $start [, int $length ] )

mb_substr()

mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )

For more information for substr() - Credits => Check Here

Upvotes: 0

Kristoffer Bohmann
Kristoffer Bohmann

Reputation: 4094

Alternative solution for UTF-8 encoded strings - this will convert UTF-8 to characters before cutting the sub-string.

$articleText = substr(utf8_decode($articleText),0,500);

To get the articleText string back to UTF-8, an extra operation will be needed:

$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );

Upvotes: 0

Dr Nick Engerer
Dr Nick Engerer

Reputation: 785

ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:

 trim() 

or:

 var_dump(preg_match('/^\n|\n$/', $variable));

or:

str_replace (array('\r\n', '\n', '\r'), ' ', $text)

Don't catch.

Upvotes: 0

Uğur &#214;zpınar
Uğur &#214;zpınar

Reputation: 1043

Use mb_substr instead, it is able to deal with multiple encodings, not only single-byte strings as substr:

$articleText = mb_substr($articleText,0,500,'UTF-8');

Upvotes: 20

deceze
deceze

Reputation: 522042

Looks like you're slicing a unicode character in half there. Use mb_substr instead for unicode-safe string slicing.

Upvotes: 6

Pascal MARTIN
Pascal MARTIN

Reputation: 400972

substr is counting using bytes, and not characters.

greek probably means you are using some multi-byte encoding, like UTF-8 -- and counting per bytes is not quite good for those.

Maybe using mb_substr could help, here : the mb_* functions have been created specifically for multi-byte encodings.

Upvotes: 61

Related Questions