Puppy
Puppy

Reputation: 146968

Unicode alphanumeric character range

I'm looking at the IsCharAlphaNumeric Windows API function. As it only takes a single TCHAR, it obviously can't make any decisions about surrogate pairs for UTF16 content. Does that mean that there are no alphanumeric characters that are surrogate pairs?

Upvotes: 3

Views: 1908

Answers (3)

Raymond Chen
Raymond Chen

Reputation: 45172

Characters outside the BMP can be letters. (Michael Kaplan recently discussed a bug in the classification of the character U+1F48C.) But IsCharAlphaNumeric cannot see characters outside the BMP (for the reasons you noted), so you cannot obtain classification information for them that way.

If you have a surrogate pair, call GetStringType with cchSrc = 2 and check for C1_ALPHA and C1_DIGIT.

Edit: The second half of this answer is incorrect GetStringType does not support surrogate pairs.

Upvotes: 5

Mike Samuel
Mike Samuel

Reputation: 120516

Does that mean that there are no alphanumeric characters that are surrogate pairs?

No, there are supplementary code-points that are in the letter group.

Comparing a char to a code-point?

For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.

Upvotes: 0

Kerrek SB
Kerrek SB

Reputation: 477268

You can determine yourself by looking at the Unicode plane assignment what you are missing by not being able to inspect non-BMP codepoints.

For example, you won't be able to identify imperial Aramaic characters as alphanumeric. Shame.

Upvotes: 0

Related Questions