Reputation: 10037
Lua has a function named utf8.len()
which operates on a const char *
and does the following according to the docs:
Returns the number of UTF-8 characters in string s https://www.lua.org/manual/5.3/manual.html#6.5
I'm working with a customized version of Lua that interfaces with the Win32 API. Whenever I need to pass a UTF-8 string to the Win32 backend of my app, I convert it from UTF-8 to WCHAR
using MultiByteToWideChar()
.
Now I'm looking for a function that does exactly the same as Lua's utf8.len()
function but takes a UTF-16 WCHAR*
string instead of a UTF-8 const char*
string. Please don't ask me about any Unicode intricacies and terminological discussions. I have already been told that the term character is very ambiguous when talking about Unicode but the Lua documentation uses exactly this term (see above). So what I want is a function that does exactly the same as Lua's utf8.len()
but operates on a WCHAR*
instead of a const char *
... regardless of what the Lua authors actually mean by character. I just want to have a function that gives me exactly the same count as utf8.len()
but operates on a UTF-16 WCHAR*
string generated from a UTF-8 string by MultiByteToWideChar()
.
I hope the question is now finally sufficiently clear enough...
One last note: I'd like to avoid using external libraries like ICU if possible. Win32 API solutions are preferred.
Upvotes: 0
Views: 1571
Reputation: 11588
Looking at the Lua utf8
source code, utf8.len()
just counts the number of codepoints, so (for example) combining characters would be counted separately. wcslen()
is the way to go, then.
You should, however, note that if the string contains characters outside the BMP (U+10000 or higher; Emoji, for instance), wcslen()
can't return the same thing as utf8.len()
. This is because UTF-16 cannot represent these using a single code point; instead, it has to break the code point into two special code points that combined are called surrogate pairs. If you need to treat a surrogate pair as a single code point, you're going to have to write that length loop yourself.
Upvotes: 1