Reputation: 146968
I'm looking at the IsCharAlphaNumeric Windows API function. As it only takes a single TCHAR, it obviously can't make any decisions about surrogate pairs for UTF16 content. Does that mean that there are no alphanumeric characters that are surrogate pairs?
Upvotes: 3
Views: 1908
Reputation: 45172
Characters outside the BMP can be letters. (Michael Kaplan recently discussed a bug in the classification of the character U+1F48C.) But IsCharAlphaNumeric
cannot see characters outside the BMP (for the reasons you noted), so you cannot obtain classification information for them that way.
If you have a surrogate pair, call GetStringType
with cchSrc = 2
and check for C1_ALPHA
and C1_DIGIT
.
Edit: The second half of this answer is incorrect GetStringType
does not support surrogate pairs.
Upvotes: 5
Reputation: 120516
Does that mean that there are no alphanumeric characters that are surrogate pairs?
No, there are supplementary code-points that are in the letter group.
Comparing a char to a code-point?
For example,
Character.isLetter('\uD840')
returnsfalse
, even though this specific value if followed by any low-surrogate value in a string would represent a letter.
Upvotes: 0
Reputation: 477268
You can determine yourself by looking at the Unicode plane assignment what you are missing by not being able to inspect non-BMP codepoints.
For example, you won't be able to identify imperial Aramaic characters as alphanumeric. Shame.
Upvotes: 0