Reputation: 311
Are there any PostgreSQL functions other than ascii()
to show the code points and utf8 encodings for characters?
ascii()
(as the name suggests?) is limited, as the following example shows:
the character ą, an a with the diacritic ̨, is actually a combination of two characters rendered as one:
an a \x61\ (=97 in decimal)
a so-called combining character, a separate ogonek: ̨ \xCC\xA8
ascii()
is not suitable for this kind of character (combination):
select ascii('ą');
ascii
-------
97
97 is the code point of the character a, so the full code point of ą is not shown.
How can I get the code point and utf8 encoding for any character in PostgreSQL, i.e. also for combined ones?
Upvotes: 1
Views: 569
Reputation: 247260
The trouble is that that is not a single character, but the combination of two characters, an a
and a “combining character”. While these are rendered as a single character, they are not.
If you used the single character ą (UNICODE code point 261), you wouldn't have that problem.
You would need software that translates character combinations into single characters (where possible), but PostgreSQL has no such function as far as I know.
Upvotes: 1