Reputation: 119
My database is in latin1 and collation is latin1_swedish_ci. Also in my php file I'm using iso-8859-1. But If I store some characters like 'é' and some others I'm having problem in dowloading the content. So we have added the default character-set to utf8 and skip-character-set-client-handshake in my.cnf file. Also added 'set-name utf8' in query soon after every connection establishment and before executing any query. This solved the issue. But I have done some observations as below
Case 1:
Mysql conf file: No default character-set and no skip-character-set-client-handshake
Query: No set names
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | latin1 |<br/>
| character_set_connection | latin1 |<br/>
| character_set_database | latin1 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | latin1 |<br/>
| character_set_server | latin1 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
stored ->é(utf8)(Hex->C3A9)<br/>
Case2:
Mysql conf file: Default character-set - utf8 and no skip-character-set-client-handshake
Query: No set names<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | latin1 |<br/>
| character_set_connection | latin1 |<br/>
| character_set_database | utf8 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | latin1 |<br/>
| character_set_server | utf8 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
<br/>
stored ->é(utf8)(Hex->C3A9)<br/>
Case3:
Mysql conf file: Default character-set - utf8 and skip-character-set-client-handshake
Query: No set names<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | utf8 |<br/>
| character_set_connection | utf8 |<br/>
| character_set_database | utf8 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | utf8 |<br/>
| character_set_server | utf8 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>
stored ->é(latin1)(Hex->E9)<br/>
Case4:
Mysql conf file: no Default characterset - utf8 and skip-character-set-client-handshake
Query: No set names<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | utf8 |<br/>
| character_set_connection | utf8 |<br/>
| character_set_database | utf8 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | utf8 |<br/>
| character_set_server | utf8 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>
stored ->é(utf8)(Hex->C3A9)<br/>
Case5:
Mysql conf file: Default characterset - utf8 and skip-character-set-client-handshake
Query: set names utf8<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | utf8 |<br/>
| character_set_connection | utf8 |<br/>
| character_set_database | utf8 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | utf8 |<br/>
| character_set_server | utf8 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>
stored ->é(latin1)(Hex->E9)<br/>
Case6:
Mysql conf file: Default characterset - utf8 and no skip-character-set-client-handshake
Query: set names utf8<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | latin1 |<br/>
| character_set_connection | latin1 |<br/>
| character_set_database | utf8 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | latin1 |<br/>
| character_set_server | utf8 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>
stored ->é(latin1)(Hex->E9)<br/>
Case7:
Mysql conf file: no Default characterset and no skip-character-set-client-handshake
Query: set names utf8<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | latin1 |<br/>
| character_set_connection | latin1 |<br/>
| character_set_database | latin1 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | latin1 |<br/>
| character_set_server | latin1 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>
stored ->é(latin1)(Hex->E9)<br/>
Case8:
Mysql conf file: no Default characterset and skip-character-set-client-handshake
Query: set names utf8<br/>
Result:
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name | Value |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client | latin1 |<br/>
| character_set_connection | latin1 |<br/>
| character_set_database | latin1 |<br/>
| character_set_filesystem | binary |<br/>
| character_set_results | latin1 |<br/>
| character_set_server | latin1 |<br/>
| character_set_system | utf8 |<br/>
| character_sets_dir | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>
stored ->é(latin1)(Hex->E9)<br/>
Output containing all the 8 cases together<br/>
+-----------+------------------------------------------------------------------+<br/>
| HEX(name) | desc |<br/>
+-----------+------------------------------------------------------------------+<br/>
| C3A9 | no skip handshake and no default in conf and nothing in query |<br/>
| C3A9 | no skip handshake and default utf8 in conf and nothing in query |<br/>
| E9 | skip handshake and default utf8 in conf and nothing in query |<br/>
| C3A9 | skip handshake and no default in conf and nothing in query |<br/>
| E9 | skip handshake and default utf8 in conf and utf8 in query |<br/>
| E9 | no skip handshake and default utf8 in conf and utf8 in query |<br/>
| E9 | no skip handshake and no default in conf and utf8 in query |<br/>
| E9 | skip handshake and no default in conf and utf8 in query |<br/>
+-----------+------------------------------------------------------------------+<br/>
On what basis the data is being stored in database? Sometimes its stored in latin1 format and sometimes in utf8. Is it based on options(I mean variables like character_set_client, character_set_server etc...,) or On my.cnf configuration??
Considering all the 8 cases I'm not getting a conclusion on this. I have also gone through the explaination on 'SET Names', 'skip-character-set-client-handshake'. But still in confusion. Is there any conversion happens while storing or just while displaying??
$conn = mysqli_connect('<host>', '<username>', '<password>', 'table');<br/>
mysqli_query($conn, "SET NAMES 'utf8';");<br/>
mysqli_query($conn, 'insert into router.test values ("é");');<br/>
Thanks in advance for the replies.
Upvotes: 1
Views: 1596
Reputation: 119
Thanks for the response and links @Rick James.
Actually I got the answer for my question from some other link and also from above shared link by Rick James. This satisfies all the above 8 cases.
If there is a discrepancy between the connection(utf8) and the storage character-set(latin1), MySQL converts the content from one encoding to another.
Case 1: Here we are actually writing UTF-8 characters to the latin1 database, each of the UTF-8 byte sequences will be interpreted as separate latin1 characters. Each byte composing an UTF-8 character(é) is interpreted as a separate latin1 characters and each character is converted to UTF-8 when writing to the table(à © -> Ã © -> c3 A9)
i.e., the string inserted is é, in UTF-8 (é) the character is represented as two bytes, hexadecimally denoted as C3A9. As we insert UTF-8 data into this table it just treats the two bytes sent as two latin1 characters, does no conversion, and saves them as such into the table. Before doing SET-NAMES to utf8, its inserting in the format C3A9
mysql> insert into test.test values ('é');
Query OK, 1 row affected (0.00 sec)
mysql> select hex(text) from test;
+-----------+
| hex(text) |
+-----------+
| C3A9 |
+-----------+
1 row in set (0.00 sec)
Case 2: When we do SET-NAMES to utf8 whatever data we try to insert to table it will consider it as utf8, as the targeted column(text) is of type latin1, it will convert the utf8 data(2bytes -> C3E9) to latin1 (1byte -> E9).
mysql> set names 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into test.test values ('é');
Query OK, 1 row affected (0.00 sec)
mysql> select hex(text) from test;
+-----------+
| hex(text) |
+-----------+
| C3A9 |
| E9 |
+-----------+
2 rows in set (0.00 sec)
So after doing SET-NAMES to utf8 whatever data sent from client especially characters like é are converted and stored in latin1 format itself and not treated as 2 separate latin1 characters.
Also while fetching the data it will convert back to its original format.
SET NAMES indicates what character-set the client will use to send SQL statements to the server. Thus, SET NAMES 'utf8' tells the server, “future incoming messages from this client are in character-set utf8.” It also specifies the character-set that the server should use for sending results back to the client.
A SET NAMES 'charset_name' statement is equivalent to these three statements:
SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;
Upvotes: 0
Reputation: 142528
Your client sends either C3A9
or E9
-- independent of anything that MySQL says.
my.cnf
and/or SET NAMES
and/or the connection parameters determine the 3 of the values in SHOW VARIABLES LIKE 'character_set_%'
.
Those settings say whether to interpret C3A9
or E9
as latin1
or utf8
:
C3A9
interpreted as utf8: good (é
)
C3A9
interpreted as latin1: Mojibake (é
)
E9
interpreted as latin1: good (é
)
E8
interpreted as utf8: String is truncated due to non-utf8 byte
But we are not finished...
If you are INSERTing
, then the Server looks at the encoding of the target column and converts from the above (latin1 or utf8) to the column's declaration. If same, no conversion needed, if different, conversion occurs during store. I'm surprised you didn't stumble on "double encoding" wherein é
is stored. This happens when utf8 bytes (C3A9
) are mis-declared to be latin1
via SET NAMES
(etc), then stored into a utf8 column (hence another conversion).
More discussion: Trouble with UTF-8 characters; what I see is not what I stored and http://mysql.rjweb.org/doc.php/charcoll
Upvotes: 1
Reputation: 108841
My database is in latin1 and collation is latin1_swedish_ci.
Those are default settings. After you've created columns and tables, each column has a character set and a collation. Changing the defaults does not change existing column and table definitions.
in my php file I'm using iso-8859-1.
That is ok. Latin1 === iso-8859-1
But If I store some characters like 'é' and some others I'm having problem in dowloading the content.
I guess you mean the two-step process of retrieving the content from your table and then sending it to a web browser for rendering. It's the second step that might be wrong. Try setting this in the head section in your php / html file.
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
If you set that and everything renders correctly, you're done. You can also set it to unicode and see what happens. If you don't set it, browsers try to guess, using arcane rules.
So we have added the default character-set to utf8 and skip-character-set-client-handshake in my.cnf file.
Again, that doesn't change your existing columns in your tables.
On what basis the data is being stored in database?
Again, according to the character set chosen for each column.
Sometimes its stored in latin1 format and sometimes in utf8.
It's true.
Best practice: mention the character set and collation for each table as you create it. If you have exceptions for some columns, mention them for the columns as you create them
Best practice: for new databases use utfmb4.
Best practice: always set your database connection character set.
Best practice: read about how to create and manipulate unicode strings in php.
Best practice: when standing up a new MySQL server, set the server-wide defaults to utfmb4 and utfmb4_general_ci.
Unfortunately, migrating an existing php application to unicode can be a pain in the neck.
Upvotes: 2