Reputation: 312
My php reads some product attributes from a database. The text is read as utf-8.
For the purposes of testing: As it parses the data I output some to a browser which renders perfectly.Notre Protéine de Soja 90 en poudre fournit plus de 90% de protéines de soja par 100g (base sèche) vérifié par les derniers résultats des tests indépendants réalisés sur nos produits.
.
Then I've tried writing it to a file using php like so -
file_put_contents(filename, utf8_encode($data));
and
file_put_contents(filename, $data);
and
$handle = fopen($filename, 'w');
fwrite($handle,utf8_encode($data));
fclose($handle);
and
$handle = fopen($filename, 'w');
fwrite($handle,$data);
fclose($handle);
For some reason when it writes the data to the file and then I view it through the file the data changes to this Notre Protéine de Soja 90 en poudre fournit plus de 90% de protéines de soja par 100g (base sèche) vérifié par les derniers résultats des tests indépendants réalisés sur nos produits.
**The main issue being that the french accent gets changed.(the slanted line above vowels) **
I thought it may have been that the file was somehow a different format so within the command line I did the following -
php > $e = file_get_contents('filename.csv');
php > echo mb_detect_encoding($e);
UTF-8
php > $e = file_get_contents('filename.csv');
php > echo mb_detect_encoding($e);
UTF-8
php >
So the file is utf-8 coding and which is also what the text is when I output it on the browser. Does this mean the changes to the text are not an encoding issue? If not what is it?
Upvotes: 1
Views: 83
Reputation: 177
It looks like the data is entity encoded, meaning that any special characters with equivalent HTML entities are translated. This is to display the characters correctly on a web page.
My guess is that the strings you receive from the database are entity encoded in the database on purpose, and that when you display them, they show up as they should (because the browser decodes the entities), but in a text file you can see the entities.
I would say there is no problem here! But if you want an entity-free string, you can run it through html_entity_decode().
Edit: deceze's answer explains this even better!
Upvotes: 3
Reputation: 522016
é
is an HTML entity, meaning "special" characters in the text are HTML-encoded. This has nothing to do with UTF-8 or utf8_encode
or file_put_contents
; none of these functions will HTML-encode a string.
More than likely the original data in your database is HTML-encoded, and you have not noticed this before putting the contents into a file, because outputting HTML entities to a browser will render those entities as the regular characters they represent.
Upvotes: 2