Reputation: 5923
I'm trying to ensure that a column in Postgres only holds a single emoji. However, I'm struggling with the length constraint since e.g.
length('😁') -- 1
length('🇧🇷') -- 2
length('👩❤️💋👩') -- 8
I've found How to count the correct length of a string with emojis in javascript? with some guidance on how to count the length in JS, but it's not that useful for Postgres since the approach uses the Unicode regex flag, which is not available in Postgres.
How would I count string length in Postgres such that all the emojis in my example count as length 1?
Upvotes: 1
Views: 1203
Reputation: 82734
I’d love to stand corrected, but I assume you’ll be out of luck here.
Emojis are a wild beast, character-wise. There are several potential groups, that you’d all need to cover:
Emoji
Unicode property.And to top it all off, there is both an official ever-growing list of emojis and emojis, that are only supported by some platforms. (Example: Man Zombie: Light Skin Tone ) Unicode calls them non-RGI, not Recommended For General Interchange.
You need to decide, whether you want to support only Unicode-approved emojis or non-RGIs, too.
If you want only Unicode emojis, you could model your requirements with another second table, that you can periodically recreate from the official Unicode data. E.g., take this file:
https://github.com/unicode-org/unicodetools/blob/main/unicodetools/data/emoji/14.0/emoji-test.txt
(Note the “14.0” in the URL that you need to update with new Unicode Emoji versions!)
Take all lines with the text “fully-qualified”, in them everything up until the first colon, convert that from hex code points into a string and feed your helper table with that.
Example:
curl -sS https://raw.githubusercontent.com/unicode-org/unicodetools/main/unicodetools/data/emoji/14.0/emoji-test.txt | \
sed '/^\(#.*\)\?$/d' | \
sed -n '/fully-qualified/p' | \
sed 's/ *;.*//'
This gives you a long list of code points in hex format (e.g. 1F9D1 200D 1F52C
). You can feed them in a script, change them from hex to a string and put them in a small helper table:
CREATE TABLE unicode_emojis (
emoji TEXT PRIMARY KEY
);
Then in your other queries make sure, that values that go into your original table are in the unicode_emojis
table, too.
Upvotes: 2