Reputation: 2233
Alright, so I've recently dipped back into C++. It's been 13 years since I've even looked at any C/++ code.
I am designing a piece of software for Windows and what I am struggling with is implementing 3rd party code (such as libssh2) that is strictly UTF-8 and provide no other wide character API implementations. Coming back to Windows, every API I've seen uses UTF-16 (wchar_t
).
So my question is: Am I forced to do string conversions every time I use a non-standard Windows implementation (libssh2 for example)? I have a variable that is returned as a wchar but the libssh2 API's only provide a char implementation.
Should I stick to using char
rather then wchar_t
? If I do that, then I am forced once again to convert to wchar_t
to use the Windows API. I am using several 3rd party sources and several Windows API's in my code. My head hurts.
What is the best practice here?
Upvotes: 1
Views: 333
Reputation: 51506
What is the best practice here?
You know the answer already. If an API requires character strings with a particular encoding, you must supply character strings with that character encoding.
If you are dealing with several APIs that expect strings in different character encodings, you have to convert between the encodings.
Windows uses UTF-16 throughout (with very few exceptions). To convert between UTF-8 and UTF-16 you need to call MultiByteToWideChar and WideCharToMultiByte, respectively.
char
and wchar_t
(on Windows) is, that wchar_t
unambiguously designates UTF-16LE encoded characters, whereas char
can be ASCII, ANSI, UTF-8, or some other encoding. If none of the other factors have produced a decision, going with wchar_t
/UTF-16 on Windows provides additional safety. It allows the compiler to report an error, when (potentially) passing a non-Unicode character string to an API expecting wchar_t
/UTF-16.Upvotes: 2
Reputation: 11588
Your best bet is to use the encoding that you most often use everywhere and convert at each other endpoint. In this case, it sounds like you want to use UTF-8 strings everywhere and convert to UTF-16 and back at each Windows API call point (or set of calls, if they're consecutive) since it sounds like you have far more external calls than Windows API calls. This should hopefully limit the number of conversions you have to actually do, and should perform reasonably well. If you find that conversion like this is too slow, use instrumentation to be sure and then see if there are other APIs that you can use for conversion (refer to Raymond Chen's "Loading the dictionary" sub-series for a good read on the latter, but remember Knuth's maxim on premature optimization).
Upvotes: 2