Reputation: 783
I am writing a function which returns linguistic information about the character at point. This is easy for pre-composed characters. However, I wish to account for diacritics. I believe these are referred to as "marks" or "combining characters" in Unicode (cf. plane U+0300 - U+036F).
For example, to place the centralization diacritic (U+0306) on the character e:
e C-x 8 <RET> 0306 <RET>
Run C-u C-x =
on the resulting character and you will see something like "Composed with the following character(s) ̆ "
Functions such as following-char
unfortunately only return the base character, i.e. "e", and ignore any combining diacritics. Is there any way to get these?
EDIT: slitvinov pointed out that the resulting glyph consists of two characters. If you place point before the glyph created by the above code, and execute (point)
before and after running forward-char
, you will see point increase by 2. I figured I could hack a solution through this behaviour, but it appears that inside a progn
statement (or function definition), forward-char
only moves point forward by one... try it in a defun
or with (progn (forward-char) (point))
. Why might this be?
Upvotes: 5
Views: 548
Reputation: 5768
I think diacritic e
is treated as two characters. I put this combination in the file
e(diacritic e)e
.
ĕee
(char-after 1)
(char-after 2)
(char-after 3)
(char-after 4)
It gives me.
101 101 774 101
And 774 is a decimal form of 0306.
Upvotes: 2