jasin_89
jasin_89

Reputation: 2001

MySQL decode Unicode to UTF-8 function

I want to decode Unicode strings to UTF-8 when inserting in a table. Here's what I have:

('\u0645\u064e\u062b\u0652\u0646\u064e\u0649 \u00a0\u062c \u0645\u064e\u062b\u064e\u0627\u0646\u064d')

So I want these values to be converted to UTF-8, for example:

INSERT INTO `nouns`(`NOUNID`, `WORDID`, `SINGULAR`, `PLURAL`) VALUES (781, 3188, '\u0646\u064e\u062c\u0652\u0645', ('\u0646\u064e\u062c\u0652\u0645'))

I am migrating my h2 database to MySQL, so I got this when scripting my h2 db:

INSERT INTO PUBLIC.NOUNS(NOUNID, WORDID, SINGULAR, PLURAL) VALUES
  (1, 5, STRINGDECODE('\u0623\u0628\u0651 '), STRINGDECODE
  ('\u0623\u0624\u064f\u0628\u0651')),
  (2, 9, STRINGDECODE('\u0623\u064e\u0628\u0627\u0628'), ''),

basicly thees \u0623\u0632\u0651 are arabic charchters in unicode representation and I want to convert them in real arabic characters, to be stored like that in database. I was trying convert function, but because I am new to mysql I couldn't achive this:

SELECT CONVERT(_ucs2'\u0623' USING utf8);

Upvotes: 2

Views: 16644

Answers (2)

adecaneda
adecaneda

Reputation: 21

Inspired on @Joni's answer, and just in case you don't want to use functions, I put it all together in one query. The drawback is that you have to execute the query several times, since on every execution only one 'character' by row get transformed.

UPDATE table_name 
SET content = REPLACE(
  content,
  SUBSTRING(content, locate('\\u', content), 6),
  CHAR(CONV(SUBSTRING(content, LOCATE('\\u', content) + 2, 4), 16, 10)  using 
  ucs2)) 
WHERE content like '%\\u0%';

Upvotes: 2

Joni
Joni

Reputation: 111279

There is no built-in function to decode unicode escapes in MySQL, but you can create one; see below:

Note that the backslash is an escape character in MySQL, so you'll need to double them when you write SQL: '\\u0623\\u064e\\u0628\\u0627\\u0628'

DELIMITER //

CREATE FUNCTION STRINGDECODE(str TEXT CHARSET utf8)
RETURNS text CHARSET utf8 DETERMINISTIC
BEGIN
declare pos int;
declare escape char(6) charset utf8;
declare unescape char(3) charset utf8;
set pos = locate('\\u', str);
while pos > 0 do
    set escape = substring(str, pos, 6);
    set unescape = char(conv(substring(escape,3),16,10) using ucs2);
    set str = replace(str, escape, unescape);
    set pos = locate('\\u', str, pos+1);
end while;
return str;
END//

DELIMITER ;

Upvotes: 15

Related Questions