Reputation: 969
I need to query the database for entries that contain only a certain set of Unicode Japanese characters and nothing else.
I've tried using WHERE word RLIKE '^([あいうえお])+$'
but that doesn't work with Japanese because of lack of Unicode support in MySQL's regex.
Is there any other way to accomplish this?
Upvotes: 2
Views: 521
Reputation: 536765
MySQL is looking at each character as a byte sequence, so あ
is 0xE3, 0x81, 0x82 and your [あいうえお]
is actually looking for any sequence of bytes 0xE3, 0x81, 0x82, 0x84, 0x86, 0x88 and 0x8A. That will match あ
fine, but it will also match other sequences that don't correspond to a single character in the list, for example 0xE3, 0x82, 0x81 which is め
.
An alternative way of saying [あいうえお]
that would still work when each character is considered by the regex engine as being more than one symbol would be (あ|い|う|え|お)
.
SELECT 'あ' RLIKE '^([あいうえお])+$'; -- 1
SELECT 'め' RLIKE '^([あいうえお])+$'; -- 1
SELECT 'あ' RLIKE '^(あ|い|う|え|お)+$'; -- 1
SELECT 'め' RLIKE '^(あ|い|う|え|お)+$'; -- 0
Upvotes: 3