cuteCAT
cuteCAT

Reputation: 2311

Determining a UTF16 value is surrogate in libunistring

Does GNU libunistring has an API to determine if a value is UTF16 surrogate? I am new to this library and could not locate one. Can someone help?

Upvotes: 0

Views: 69

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 595402

Perhaps uc_general_category() is what you are looking for. If you pass it a UTF-16 codeunit, the compiler will extend the 16bit value to 32bits and the function will interpret it as-is as a codepoint. Codepoints U+D800 - U+DFFF are reserved for surrogates only, so the function should return UC_SURROGATE for any UTF-16 surrogate codeunit. A non-surrogate codeunit has the same numeric value as its corresponding codepoint in the BMP (surrogates are only needed for codepoints outside the BMP), so the function would return something else.

Upvotes: 3

ooga
ooga

Reputation: 15501

Of the two 16-bit code units of a surrogate pair, the "high" surrogate is in the range 0xD800..0xDBFF and the "low" surrogate is in the range 0xDC00..0xDFFF. So it's easy to check that yourself.

Upvotes: 2

Related Questions