Victor Mezrin
Victor Mezrin

Reputation: 2847

byte representation of ASCII symbols in std::wstring with different locales

Windows C++ app. We have a string that contain only ASCII symbols: std::wstring(L"abcdeABCDE ... any other ASCII symbol"). Note that this is std::wstring that uses wchar_t.

Question - do byte representation of this string depends on the localization settings, or something else? Can I assume that if I receive such string (for example, from WindowsAPI) while app is running its bytes will be the same as on the my PC?

Upvotes: 0

Views: 440

Answers (2)

Jim Beveridge
Jim Beveridge

Reputation: 340

The byte representation of the literal string does not depend on the environment. It's hardcoded to the binary data from the editor. However, the way that binary data is interpreted depends on the current code page, so you can end up with different results when converted at runtime to a wide string (as opposed to defining the string using a leading L, which means that the wide characters will be set at compile time.)

To be safe, use setlocale() to guarantee the encoding used for conversion. Then you don't have to worry about the environment.

This might help: "By definition, the ASCII character set is a subset of all multibyte-character sets. In many multibyte character sets, each character in the range 0x00 – 0x7F is identical to the character that has the same value in the ASCII character set. For example, in both ASCII and MBCS character strings, the 1-byte NULL character ('\0') has value 0x00 and indicates the terminating null character."

From: Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

Upvotes: 1

VolAnd
VolAnd

Reputation: 6407

In general, for characters (not escape sequence) wchar_t and wstring have to use the same codes as ASCII (just extended to 2 bytes). But I am not sure about codes less then 32 and of course codes greater than 128 can has different meaning (as in ASCII) in the moment of output, so to avoid problem on output set particular locale explicitly, e.g.:

  locale("en_US.UTF-8")

for standard output

  wcout.imbue(locale("en_US.UTF-8")); 

UPDATE:

I found one more suggestion about adding

  std::ios_base::sync_with_stdio(false);

before setting localization with imbue

see details on How can I use std::imbue to set the locale for std::wcout?

Upvotes: 1

Related Questions