Reputation: 5463
I have an older MYSQL 5.6.34 database that was created several years ago (not by me). I've taken one of the databases and started building tables in it for use in my applications when I noticed some of the other tables have a latin1
encoding. Then I noticed all the databases, with the exception of the information_schema
all use latin1
.
SCHEMA_NAME DEFAULT_CHARACTER_SET_NAME DEFAULT_COLLATION_NAME
information_schema utf8 utf8_general_ci
443347_pxa latin1 latin1_swedish_ci
443348_srp latin1 latin1_swedish_ci
warehouseinventory latin1 latin1_swedish_ci
I'm not sure why the original developer used latin1 for encoding and collation. There's no reason why really.
I'm don't really want to change any encoding on any existing databases, I just want to create new tables with utf8mb4. But the more I think about it, to stay consistent I'd maybe want to change them. The last database I created was warehouseinventory
but I didn't notice the encoding, and I assume it defaulted to latin1
based on the encoding of the previously created databases.
Is there any reason why I'd leave these at latin1
and would it hurt to change them? One of the databases does have spanish language stored in it, but I always thought that was just UTF.
Upvotes: 1
Views: 5136
Reputation: 142238
If you have any indexes on column(s) that are declared VARCHAR(255)
, you could get an error about "index bigger than 767 bytes" when converting to utf8mb4. There are 5 workarounds .
If you don't have any varchars between 191 and 255 characters that are indexed, then
ALTER TABLE tbl CONVERT TO utf8mb4;
will convert all columns in the table tbl
to utf8mb4.
If you happen to have incorrectly stored utf8 bytes into latin1 in "double encoding", you need a different fix. Review Fixes for various Cases .
The "default" character set or collation is just that -- a default. That is, when a new column or table is created, it takes on the default. If you explicitly specify charset and/or collation for a column, that overrides the default for the table.
There's an old saying: "If it ain't broke, don't fix it." I lean toward making new databases/tables/columns utf8mb4, but leaving the old ones alone. Note: Having a mixture is OK. The connection specifies what encoding the client uses. MySQL, during INSERT
or SELECT
will convert from/to the client's settings to/from the column's settings.
Spanish has a limited set of characters. That set is a subset of latin1, utf8, and utf8mb4, so any of those charsets work with Spanish. However, the encoding is different. For example, ñ
is
Hex F1 in latin1
Hex C3B1 in utf8 and utf8mb4
Similarly for the rest of Western Europe. Asia needs at least utf8. Emoji and some Chinese characters need utf8mb4.
If you end up with question marks, Mojibake, etc, debug it here .
Terminology: Outside MySQL: UTF-8
; Inside MySQL: utf8mb4
. Those are essentially the same. MySQL's utf8
is a subset of them.
Tést¥
, in hex:
54 E9 73 74 A5 -- cp1256, dec8, latin1, latin5 encoding
54 C3A9 73 74 C2A5 -- utf8, utf8mb4 encoding
54 C383 C2A9 73 74 C382 C2A5 -- "double encoded"; may show as Tést¥
Upvotes: 3