Reputation: 781
I'm get data from MySQL db, varchar(255) utf8_general_ci field and try to write the text to a PDF with PHP. I need to determine the string length in the PDF to limit the output of the text in a table. But I noticed that the output of mb_substr
/substr
is really strange.
For example:
mb_internal_encoding("UTF-8");
$_tmpStr = $vfrow['title'];
$_tmpStrLen = mb_strlen($vfrow['title']);
for($i=$_tmpStrLen; $i >= 0; $i--){
file_put_contents('cutoffattributes.txt',$vfrow['field']." ".$_tmpStr."\n",FILE_APPEND);
file_put_contents('cutoffattributes.txt',$vfrow['field']." ".mb_substr($_tmpStr, 0, $i)."\n",FILE_APPEND);
}
outputs this:
Database:
My question is where does the extra character come from?
Upvotes: 4
Views: 1178
Reputation: 9428
The extra character is first part of two byte UTF-8 sequence. You may have problems with internal encoding of Multibyte String Functions. Your code treats text as fixed, 1-byte encoding. The ń in UTF-8, hex C5 84, is treated as Ĺ„ in CP-1250 and Ĺ[IND] in ISO-8859-2, two characters.
Try to execute this one on the top of script:
mb_internal_encoding("UTF-8");
http://php.net/manual/en/function.mb-internal-encoding.php
Upvotes: 1
Reputation: 522024
You need to tell your mb_
functions that the data is in UTF-8 so they can treat it correctly. Either set this globally for all functions using mb_internal_encoding
, or pass the $encoding
parameter to your function when you call it:
mb_substr($_tmpStr, 0, $i, 'UTF-8')
Upvotes: 1
Reputation: 412
Aside from table and field being set to UTF-8 you need to set mysqli_set_charset('UTF-8') to UTF-8 also (if you are using mysqli).
Also did you try?
$_tmpStr = utf8_encode( $vfrow['title'] );
Upvotes: 0