Reputation: 1712
Is there a cross-platform way to convert from UTF-8 to Latin/Arabic and from Latin/Arabicto UTF-8 in C++?
Upvotes: 4
Views: 1574
Reputation: 76276
There is not, but there is a cross-platform way to convert between unicode represented in wchar_t
(which is 16-bit on Windows and 32-bit on most of the other platforms) and whatever is set as locale character encoding in the system using wcstombs
/mbstowcs
routines from standard C library or codecvt
facet of locale
in standard C++ library. The conversion between wchar_t
, where each element is one codepoint and utf-8 is than quite simple. So you can write or copy from somewhere a routine to convert between utf-8 and unicode in wchar_t
and combine it with wcstombs
/mbstowcs
.
Upvotes: 0
Reputation: 8774
There are libraries like icu available. But Erik is, of course, right: The round-trip from Unicode through ISO 8859-6 will be lossy. (Yes, UTF-8 is “Unicode.” UTF-16, is “Unicode,” too, just having different bit-patterns for the same code number. See Joel Spolsky's text if you didn't know that. Or if you haven't read it yet, it's good material.)
Upvotes: 3