Philip Seyfi
Philip Seyfi

Reputation: 969

MySQL string contains only certain unicode characters

I need to query the database for entries that contain only a certain set of Unicode Japanese characters and nothing else.

I've tried using WHERE word RLIKE '^([あいうえお])+$' but that doesn't work with Japanese because of lack of Unicode support in MySQL's regex.

Is there any other way to accomplish this?

Upvotes: 2

Views: 521

Answers (1)

bobince
bobince

Reputation: 536765

MySQL is looking at each character as a byte sequence, so is 0xE3, 0x81, 0x82 and your [あいうえお] is actually looking for any sequence of bytes 0xE3, 0x81, 0x82, 0x84, 0x86, 0x88 and 0x8A. That will match fine, but it will also match other sequences that don't correspond to a single character in the list, for example 0xE3, 0x82, 0x81 which is .

An alternative way of saying [あいうえお] that would still work when each character is considered by the regex engine as being more than one symbol would be (あ|い|う|え|お).

SELECT 'あ' RLIKE '^([あいうえお])+$';    -- 1
SELECT 'め' RLIKE '^([あいうえお])+$';    -- 1
SELECT 'あ' RLIKE '^(あ|い|う|え|お)+$';  -- 1
SELECT 'め' RLIKE '^(あ|い|う|え|お)+$';  -- 0

Upvotes: 3

Related Questions