Nikola Novak
Nikola Novak

Reputation: 4462

How to make this MySQL function not mess up my character set?

I needed a MySQL function which trims all whitespace from beginning and end of a string (including newlines and tabs), so I created the following function:

DROP FUNCTION IF EXISTS WSTrim;
DELIMITER $$
CREATE FUNCTION `WSTrim`(text LONGTEXT CHARSET utf8) RETURNS LONGTEXT CHARSET utf8
    NO SQL
    DETERMINISTIC
BEGIN
 RETURN TRIM(REPLACE(REPLACE(REPLACE(text, '\t', ' '), '\r', ' '), '\n', ' '));
END$$
DELIMITER ;

Despite what it says in parameter declaration and in the specification of the return value, this function messes up my character set, which I would want to be utf8. If I call this function like so:

INSERT INTO mytable (name) VALUES (WSTrim(' ČĆŽŠĐ čćžšđ   '));

I get a warning

1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xC4\x8C\xC4\x86\xC5\xBD...' for column 'name' at row 1

How do I change my function to not mess up my characters?

Edit 1:

The connection string I'm using is

Server=myserver;Port=myport;Database=mydb;Uid=myuid;Pwd=mypwd;CharSet=utf8

Also, for completeness,

mysql> show variables where Variable_name like 'character%' or Variable_name like 'collation%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
| collation_connection     | utf8_general_ci            |
| collation_database       | utf8_unicode_ci            |
| collation_server         | utf8_unicode_ci            |
+--------------------------+----------------------------+

Note, the same thing happens even if I connect with MySQL Workbench, not just my application.

Edit 2:

INSERT INTO mytable (name) VALUES (' ČĆŽŠĐ čćžšđ   ');

This inserts the value correctly.

Edit 3:

mysql> SELECT WSTrim(' ČĆŽŠĐ čćžšđ ');
+-----------------------------------+
| WSTrim(' ČĆŽŠĐ čćžšđ ')           |
+-----------------------------------+
| ??ŽŠ? ??žš?                       |
+-----------------------------------+
1 row in set, 1 warning (0.01 sec)

It doesn't matter how I send the query. However, I've noticed if I type in

show function status;

one of the rows returned is:

+------+--------+----------+----------------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
| Db   | Name   | Type     | Definer        | Modified            | Created             | Security_type | Comment | character_set_client | collation_connection | Database Collation |
+------+--------+----------+----------------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
| mydb | WSTrim | FUNCTION | mydb@localhost | 2015-11-23 04:01:06 | 2015-11-23 04:01:06 | DEFINER       |         | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
+------+--------+----------+----------------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+

For some reason it treats the function with latin1. But how do I change this?

Edit 4:

mysql> show create database mydb;
+----------+---------------------------------------------------------------------------------------+
| Database | Create Database                                                                       |
+----------+---------------------------------------------------------------------------------------+
| mydb     | CREATE DATABASE `mydb` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci */ |
+----------+---------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Upvotes: 1

Views: 2030

Answers (1)

miken32
miken32

Reputation: 42716

According to the documentation, the character_set_client and collation_connection columns from SHOW FUNCTION STATUS show that you were not connected with utf8 during creation of the function. Likely you'll need to check connection settings to make sure that you don't have latin1 defined somewhere.

As well, the Database Collation column indicates the database itself is defined as latin1.

Upvotes: 3

Related Questions