jason
jason

Reputation: 3615

PHP wrong character set

I am trying to pull data from a table and output it as a text (RTF) file. The problem is that there are some characters in the content that get mangled. For instance, if I have Spanish content, some of the characters are not recognized and get changed. For example, if I have:

'implementación'

the word gets changed to:

'implementación'

By using break points, I can see that the string coming from the database is correct, it's only when it gets printed out that the tilde get's changed. Below is my code:

           header("Content-Type: application/rtf; charset=utf-8;");
           header("Cache-Control: public");
           header("Content-Description: File Transfer");
           header("Content-Disposition: attachment; filename=".$fileName .".rtf");
           header("Content-Transfer-Encoding: binary");

           echo $content;

Thanks for your help.

jason

Upvotes: 1

Views: 3904

Answers (2)

Will B.
Will B.

Reputation: 18416

Match the output character set with the table's character set or convert the character set from the table with the character set you want to output.

Assuming the table uses US-ASCII to store data and we want to output it as UTF-8.

$content = iconv( 'US-ASCII', 'UTF-8//IGNORE//TRANSLIT', $content );
echo $content;

This will transliterate certain characters EG: € to EUR, and ignore/drop characters that are not known to the output character set.

If you are using Latin-1-General encoding in the table try CP850 (AKA: Code Page 850, MSDOS Latin-1) as opposed to US-ASCII.

https://www.php.net/manual/en/function.iconv.php

You can optionally cast your encoding from within your query to the table For example with mysql

SELECT convert(cast(convert(content using  latin1) as binary) using utf8) AS content

Convert latin1 characters on a UTF8 table into UTF8

This is useful if the data sent to the database was using a different character set than the table. For example sending ASCII or ISO-8859-1 data to a table/column using UTF-8 collation.

To find out the tables character encoding try:

SHOW CREATE TABLE `tablename`;

or MySQL: Get character-set of database or table or column?

For table encoding:

SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "schemaname"
  AND T.table_name = "tablename";

For column encoding:

SELECT character_set_name FROM information_schema.`COLUMNS` C
WHERE table_schema = "schemaname"
  AND table_name = "tablename"
  AND column_name = "columnname";

Alternatively you can try changing the charset header in PHP to match the database table's output.

header("Content-Type: application/rtf; charset=ISO-8859-1;");

Upvotes: 1

Phil Perry
Phil Perry

Reputation: 2130

  1. Check that your database text is defined to be UTF-8 (preferably, all text in the database should be the same enoding).
  2. Check that your page output is UTF-8 and not the default Latin-1/ISO-8859-1 (or other single byte encoding, such as Windows-1252).
  3. Go into phpMyAdmin and browse the table's data, to make sure the data was actually received and processed as UTF-8. You will need to check that the phpMyAdmin browse page is actually displaying in UTF-8.
  4. If the table/field is UTF-8, and the page is UTF-8, but you still get the two characters, it is very likely that a UTF-8 backup (.sql file) was improperly imported as Latin-1 rather than UTF-8, and the two bytes of ó were individually translated to UTF-8 multibyte characters. You have to remember to tell phpMyAdmin when you IMPORT an .sql file what the file is encoded in. This is difficult to clean up, especially if you now have a mixture of encodings in your database.

Upvotes: 0

Related Questions