Reputation: 578
Is there any way to pattern match with Unicode graphemes?
As a quick example, when I run this query:
CREATE TABLE test (
id SERIAL NOT NULL,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
UNIQUE (name)
);
INSERT INTO test (name) VALUES ('ππ» One');
INSERT INTO test (name) VALUES ('π Two');
SELECT * FROM public.test WHERE test.name LIKE 'π%';
I get both rows returned, rather than just 'π Two'
. Postgres seems to be just comparing code points, but I want it to compare full graphemes, so it should only match 'π Two'
, because ππ»
is a different grapheme.
Is this possible?
Upvotes: 4
Views: 310
Reputation: 23676
It's a very interesting question!
I am not quite sure if it is possible anyway:
The skinned emojis are, in fact, two joined characters (like ligatures). The first character is the yellow hand π which is followed by an emoji skin modifier π»
This is how the light skinned hand is stored internally. So, for me, your result makes sense:
When you query any string, that begins with π, it will return:
π Two
(trivial)π_π» One
(ignore the underscore, I try to suppress the automated ligature with this)So, you can see, the light skinned emoji internally also starts with π. That's why I believe, that your query doesn't work the way you like.
Workarounds/Solutions:
You can add a space to your query. This ensures, that there's no skin modifier after your π character. Naturally, this only works in your case, where all data sets have a space after the hand:
SELECT * FROM test WHERE name LIKE 'π %';
You can simply extend the WHERE
clause like this:
SELECT * FROM test
WHERE name LIKE 'π%'
AND name NOT LIKE 'ππ»%'
AND name NOT LIKE 'ππΌ%'
AND name NOT LIKE 'ππ½%'
AND name NOT LIKE 'ππΎ%'
AND name NOT LIKE 'ππΏ%'
You can use regular expression pattern matching to exclude the skins:
SELECT * FROM test
WHERE name ~ '^π[^π»πΌπ½πΎπΏ]*$'
see demo:db<>fiddle (note that the fiddle seems not to provide automated ligatures, so both characters are separated displayed there)
Upvotes: 3