theglossy1
theglossy1

Reputation: 553

MariaDB PCRE Regular Expression issue

I want to query a range of characters in the Devanagari language. If I use

SELECT count(`dev`) FROM `transliteration` WHERE `dev` REGEXP 'क़';

It returns a count of 71. But I want to be able to query it via its unicode reference, so I would think that the following should work:

SELECT count(`dev`) FROM `transliteration` WHERE `dev` REGEXP '[\\x{958}]';

MariaDB supports PCRE so I should be able to do it, even if I have not thus far succeeded. I even read the documentation but to no avail. It seems like it's treating the number in the curly braces like I want repetition if I have more than two hex digits. Well, unicode has a lot more than two digits-worth of characters!

As mentioned, I would like to query a range ([\x{0958}-\x{095E}]), but first things first.

Upvotes: 1

Views: 336

Answers (1)

Rick James
Rick James

Reputation: 142346

Assuming you are working with utf8 encoding throughout the system, ...

This might work:

SELECT 'क़ज़फ़' REGEXP '^[क़-फ़]+$';

(It works in my test environment.)

If not, then you might need to build the REGEXP, noting that here is how you can convert between utf8 and unicode:

SELECT CONVERT(UNHEX('0958') USING ucs2); --> क़
SELECT HEX(CONVERT('क़' USING ucs2))       --> 0958

Upvotes: 2

Related Questions