Reputation: 4462
I needed a MySQL function which trims all whitespace from beginning and end of a string (including newlines and tabs), so I created the following function:
DROP FUNCTION IF EXISTS WSTrim;
DELIMITER $$
CREATE FUNCTION `WSTrim`(text LONGTEXT CHARSET utf8) RETURNS LONGTEXT CHARSET utf8
NO SQL
DETERMINISTIC
BEGIN
RETURN TRIM(REPLACE(REPLACE(REPLACE(text, '\t', ' '), '\r', ' '), '\n', ' '));
END$$
DELIMITER ;
Despite what it says in parameter declaration and in the specification of the return value, this function messes up my character set, which I would want to be utf8
. If I call this function like so:
INSERT INTO mytable (name) VALUES (WSTrim(' ČĆŽŠĐ čćžšđ '));
I get a warning
1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xC4\x8C\xC4\x86\xC5\xBD...' for column 'name' at row 1
How do I change my function to not mess up my characters?
Edit 1:
The connection string I'm using is
Server=myserver;Port=myport;Database=mydb;Uid=myuid;Pwd=mypwd;CharSet=utf8
Also, for completeness,
mysql> show variables where Variable_name like 'character%' or Variable_name like 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+--------------------------+----------------------------+
Note, the same thing happens even if I connect with MySQL Workbench, not just my application.
Edit 2:
INSERT INTO mytable (name) VALUES (' ČĆŽŠĐ čćžšđ ');
This inserts the value correctly.
Edit 3:
mysql> SELECT WSTrim(' ČĆŽŠĐ čćžšđ ');
+-----------------------------------+
| WSTrim(' ČĆŽŠĐ čćžšđ ') |
+-----------------------------------+
| ??ŽŠ? ??žš? |
+-----------------------------------+
1 row in set, 1 warning (0.01 sec)
It doesn't matter how I send the query. However, I've noticed if I type in
show function status;
one of the rows returned is:
+------+--------+----------+----------------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
| Db | Name | Type | Definer | Modified | Created | Security_type | Comment | character_set_client | collation_connection | Database Collation |
+------+--------+----------+----------------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
| mydb | WSTrim | FUNCTION | mydb@localhost | 2015-11-23 04:01:06 | 2015-11-23 04:01:06 | DEFINER | | latin1 | latin1_swedish_ci | latin1_swedish_ci |
+------+--------+----------+----------------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
For some reason it treats the function with latin1
. But how do I change this?
Edit 4:
mysql> show create database mydb;
+----------+---------------------------------------------------------------------------------------+
| Database | Create Database |
+----------+---------------------------------------------------------------------------------------+
| mydb | CREATE DATABASE `mydb` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci */ |
+----------+---------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
Upvotes: 1
Views: 2030
Reputation: 42716
According to the documentation, the character_set_client
and collation_connection
columns from SHOW FUNCTION STATUS
show that you were not connected with utf8 during creation of the function. Likely you'll need to check connection settings to make sure that you don't have latin1 defined somewhere.
As well, the Database Collation
column indicates the database itself is defined as latin1.
Upvotes: 3