Andrei Baskakov
Andrei Baskakov

Reputation: 161

Convert wchar_t* to UTF-16 string

I need a code in C++ to convert a string given in wchar_t* to a UTF-16 string. It must work both on Windows and Linux. I've looked through a lot of web-pages during the search, but the subject still is not clear to me.

As I understand I need to:

  1. Call setlocale with LC_TYPE and UTF-16 encoding.
  2. Use wcstombs to convert wchar_t to UTF-16 string.
  3. Call setlocale to restore previous locale.

Do you know the way I can convert wchar_t* to UTF-16 in a portable way (Windows and Linux)?

Upvotes: 6

Views: 10451

Answers (5)

Mihai Nita
Mihai Nita

Reputation: 5787

You can assume that wchar_t is utf-32 in the non-Windows world. It is true on Linux and Mac OS X and most *nix systems (there are very few exceptions to that, and on systems you will probably never touch :-)

And wchar_t is utf-16 on Windows. So on Windows the conversion function can just do a memcpy :-)

On everything else, the conversion is algorithmic, and pretty simple. So there is no need of fancy support from 3rd party libraries.

Here is the basic algorithm: http://unicode.org/faq/utf_bom.html#utf16-3

And you can probably find find a dozen different implementations if you don't want to write your own :-)

Upvotes: 3

wilx
wilx

Reputation: 18268

The problem is with wchar_t being rather underspecified. You could use GNU libiconv to do what you want. It accepts special encoding name "wchar_t" as both source and target encoding. That way it will be portable to both Windows and Linux and elsewhere where you can provide libiconv.

Upvotes: 2

Nicol Bolas
Nicol Bolas

Reputation: 474376

There is no single cross-platform method for doing this in C++03 (not without a library). This is in part because wchar_t is itself not the same thing across platforms. Under Windows, wchar_t is a 16-bit value, while on other platforms it is often a 32-bit value. So you would need two different codepaths to do it.

Upvotes: 8

JTeagle
JTeagle

Reputation: 2196

The g++ compiler appears to support wcstombs?

Upvotes: -1

Pubby
Pubby

Reputation: 53097

C++11's std::codecvt_utf16 should work, I think.

std::codecvt_utf16 is a std::codecvt facet which encapsulates conversion between a UTF-16 encoded byte string and UCS2 or UCS4 character string (depending on the type of Elem).

See this: http://en.cppreference.com/w/cpp/locale/codecvt_utf16

Upvotes: 5

Related Questions