MySQL command-line table column width with utf8

Question

Why mysql command-line outputs utf8 columns twice as wide compared to non-utf columns? Example:

$ mysql -u user --default-character-set=utf8
mysql> select "αβγαβγαβγαβγαβγαβγαβγ";
+--------------------------------------------+
| αβγαβγαβγαβγαβγαβγαβγ                      |
+--------------------------------------------+
| αβγαβγαβγαβγαβγαβγαβγ                      |
+--------------------------------------------+
1 row in set (0.00 sec)

mysql> select "abcabcabcabcabcabcabc";
+-----------------------+
| abcabcabcabcabcabcabc |
+-----------------------+
| abcabcabcabcabcabcabc |
+-----------------------+
1 row in set (0.00 sec)

As you can see, first table has column twice as wide compared to second table, and this often breaks formatting when lines start to get more than half-screen wide.

I tried this on MySQL 14.14 and MariaDB 15.1.

Is there a way to output utf8 columns with the same width as non-utf?

edit:

MariaDB [(none)]> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Bill Karwin · Accepted Answer

In the source code for mysql.cc (the source for the mysql client) there is an explanation in the comment block for function get_field_disp_length() which is used in the formatting of result set output.

Return the length of a field after it would be rendered into text.

This doesn't know or care about multibyte characters. Assume we're using such a charset. We can't know that all of the upcoming rows for this column will have bytes that each render into some fraction of a character. It's at least possible that a row has bytes that all render into one character each, and so the maximum length is still the number of bytes. (Assumption 1: This can't be better because we can never know the number of characters that the DB is going to send -- only the number of bytes. 2: Chars <= Bytes.)

In other words, since UTF8 can store characters that are 1 byte per character (like Latin characters), and the result can't know what the data is before it fetches it to display, it must assume any or all characters may be one byte per character.

The story might be different if you used a character set that uses a constant 2 bytes per character, like UCS-2. But I have never heard of anyone using UCS-2, since MySQL supports variable-length Unicode encodings.

MySQL command-line table column width with utf8

Answers (1)

Related Questions