user1438233
user1438233

Reputation: 1233

c++ can't get "wcout" to print unicode, and leave "cout" working

can't get "wcout" to print unicode string in multiple code pages, together with leaving "cout" to work

please help me get these 3 lines to work together.

std::wcout<<"abc "<<L'\u240d'<<" defg "<<L'א'<<" hijk"<<std::endl;
std::cout<<"hello world from cout! \n";
std::wcout<<"hello world from wcout! \n";

output:

abc hello world from cout!

i tried:

#include <io.h> 
#include <fcntl.h>
_setmode(_fileno(stdout), _O_U8TEXT);

problem: "wcout" failed

tried:

std::locale mylocale("");
std::wcout.imbue(mylocale);

and:

SetConsoleOutputCP(1251);

and

setlocale(LC_ALL, "");

and

SetConsoleCP(CP_UTF8)

Nothing worked

Upvotes: 10

Views: 11036

Answers (4)

sesquized
sesquized

Reputation: 140

The problem can be solved using C++ 14 library functions. I'm using Visual Studio 2022.

The steps need for each transition are to flush the most recently used stream and call _setmode(_fileno(stdout), _O_XXX) to prepare stdout to receive the right kind of data. To change from wcout to binary data requires two calls to _setmode()

One reason that both the flush and _setmode are necessary is to avoid crashes from having an odd number of bytes in the buffer when the library is expecting two-byte values. (Of course, if you are continuing to use the same stream, it isn't necessary to do anything until a change is made.)

You can tell I'm using Windows because the "\n" are translated to "\r\n" except in the binary output.

#include <iostream>
#include <io.h>
#inlcude <fcntl.h>

int main( )
{

  // To start sending to the console in UNICODE/wcout
  _setmode(_fileno(stdout), _O_U16TEXT);
  std::wcout << "abc " << L'\u240d' << " defg " << L'א' << " hijk" << std::endl;

  //then to switch to text/cout
  std::wcout.flush();
  _setmode(_fileno(stdout), _O_TEXT);
  std::cout << "hello world from cout! \n";

  // to switch back to wcout
  std::cout.flush();
  _setmode(_fileno(stdout), _O_U16TEXT);
  std::wcout << "hello world from wcout! \n";

  // To switch from wcout to binary output: (One reason to use cout 
  // is that wcout will not accept an odd number of bytes)
  std::wcout.flush();
  _setmode(_fileno(stdout), _O_TEXT);
  _setmode(_fileno(stdout), _O_BINARY);
  uint8_t bytes[]{ 8, 0xa0, 0xff, 0x7f, '\n', 0x00, 0x31, 0x5a };
  std::cout.write((char*)bytes, sizeof(bytes));

  // to switch back to wcout:
  std::cout.flush();
  _setmode(_fileno(stdout), _O_TEXT);
  _setmode(_fileno(stdout), _O_U16TEXT);
  std::wcout << "Done" << std::endl;

  return 0;
}

Piping it through od -tcx1 gives

0000000   a  \0   b  \0   c  \0      \0  \r   $      \0   d  \0   e  \0
         61  00  62  00  63  00  20  00  0d  24  20  00  64  00  65  00
0000020   f  \0   g  \0      \0 320 005      \0   h  \0   i  \0   j  \0
         66  00  67  00  20  00  d0  05  20  00  68  00  69  00  6a  00
0000040   k  \0  \r  \0  \n  \0   h   e   l   l   o       w   o   r   l
         6b  00  0d  00  0a  00  68  65  6c  6c  6f  20  77  6f  72  6c
0000060   d       f   r   o   m       c   o   u   t   !      \r  \n   h
         64  20  66  72  6f  6d  20  63  6f  75  74  21  20  0d  0a  68
0000100  \0   e  \0   l  \0   l  \0   o  \0      \0   w  \0   o  \0   r
         00  65  00  6c  00  6c  00  6f  00  20  00  77  00  6f  00  72
0000120  \0   l  \0   d  \0      \0   f  \0   r  \0   o  \0   m  \0    
         00  6c  00  64  00  20  00  66  00  72  00  6f  00  6d  00  20
0000140  \0   w  \0   c  \0   o  \0   u  \0   t  \0   !  \0      \0  \r
         00  77  00  63  00  6f  00  75  00  74  00  21  00  20  00  0d
0000160  \0  \n  \0  \b 240 377 177  \n  \0   1   Z   D  \0   o  \0   n
         00  0a  00  08  a0  ff  7f  0a  00  31  5a  44  00  6f  00  6e
0000200  \0   e  \0  \r  \0  \n  \0
         00  65  00  0d  00  0a  0
0000207

Upvotes: 0

dev7060
dev7060

Reputation: 120

It's because Unicode is not representable in the codepage causing wcout to fail.

std::wcout<<"abc "<<L'\u240d'<<" defg "<<L'א'<<" hijk"<<std::endl;
if(std::wcout.fail()){
    std::cout<<"\nConversion didn't succeed\n";
    std::wcout << "This statement has no effect on the console";
    std::wcout.clear();
    std::wcout<<"hello world from wcout! \n";
}
std::cout<<"hello world from cout! \n";
std::wcout<<"hello world from wcout again! \n";

Upvotes: 2

Davislor
Davislor

Reputation: 15134

Microsoft requires a bit of non-standard set-up with _setmode() before wcout or wcin can work. This example is pretty heavy on the boilerplate, so not as clear as it could possibly be, but it runs on clang++, g++ and MSVC++:

#include <iostream>
#include <locale>
#include <locale.h>
#include <stdlib.h>

#ifndef MS_STDLIB_BUGS // Allow overriding the autodetection.
/* The Microsoft C and C++ runtime libraries that ship with Visual Studio, as
 * of 2017, have a bug that neither stdio, iostreams or wide iostreams can
 * handle Unicode input or output.  Windows needs some non-standard magic to
 * work around that.  This includes programs compiled with MinGW and Clang
 * for the win32 and win64 targets.
 */
#  if ( _MSC_VER || __MINGW32__ || __MSVCRT__ )
    /* This code is being compiled either on MS Visual C++, or MinGW, or
     * clang++ in compatibility mode for either, or is being linked to the
     * msvcrt.dll runtime.
     */
#    define MS_STDLIB_BUGS 1
#  else
#    define MS_STDLIB_BUGS 0
#  endif
#endif

#if MS_STDLIB_BUGS
#  include <io.h>
#  include <fcntl.h>
#endif

#if !HAS_APP17_FILESYSTEM && !HAS_TS_FILESYSTEM && __has_include(<filesystem>)
#  include <filesystem> /* MSVC has this header, but not the standard API. */
#  if __cpp_lib_filesystem >= 201703
#    define HAS_CPP17_FILESYSTEM 1
#  endif
#endif

#if !HAS_CPP17_FILESYSTEM && __has_include(<experimental/filesystem>)
#  include <experimental/filesystem>
/* Microsoft screws this one up, too, by not defining the feature-test
 * macro specified by the standard.
 */
#  if __cpp_lib_experimental_filesystem >= 201406 || MS_STDLIB_BUGS
#    define HAS_TS_FILESYSTEM 1
/* With g++6, this requires -lstdc++fs, AFTER this source file on the
 * command line.
 */
#  endif
#endif

#if HAS_CPP17_FILESYSTEM
  using std::filesystem::absolute;
  using std::filesystem::current_path;
  using std::filesystem::directory_entry;
  using std::filesystem::directory_iterator;
  using std::filesystem::is_directory;
  using std::filesystem::exists;
  using std::filesystem::path;
#elif HAS_TS_FILESYSTEM
  using std::experimental::filesystem::absolute;
  using std::experimental::filesystem::current_path;
  using std::experimental::filesystem::directory_entry;
  using std::experimental::filesystem::directory_iterator;
  using std::experimental::filesystem::is_directory;
  using std::experimental::filesystem::exists;
  using std::experimental::filesystem::path;
#else
#  error "This library has neither <filesystem> nor <experimental/filesystem>."
#endif

void init_locale(void)
// Does magic so that wcout can work.
{
#if MS_STDLIB_BUGS
  // Windows needs a little non-standard magic.
  constexpr char cp_utf16le[] = ".1200"; // UTF-16 little-endian locale.
  setlocale( LC_ALL, cp_utf16le );
  _setmode( _fileno(stdout), _O_WTEXT );
  /* Repeat for _fileno(stdin), if needed. */
#else
  // The correct locale name may vary by OS, e.g., "en_US.utf8".
  constexpr char locale_name[] = "";
  setlocale( LC_ALL, locale_name );
  std::locale::global(std::locale(locale_name));
  std::wcin.imbue(std::locale())
  std::wcout.imbue(std::locale());
#endif
}

using std::endl;

int main( const int argc, const char * const argv[] )
{
  init_locale();

  const path cwd = (argc > 1) ? absolute(path( argv[1], std::locale() ))
                              : absolute(current_path());

  if (exists(cwd)) {
    std::wcout << cwd.wstring() << endl;
  } else {
    std::wcerr << "Path does not exist.\n";
    return EXIT_FAILURE;
  }

  if (is_directory(cwd)) {
    for ( const directory_entry &f : directory_iterator(cwd) )
      std::wcout << f.path().filename().wstring() << endl;
  }

  return EXIT_SUCCESS;
}

That’s probably a lot more complicated than it really needed to be: std::filesystem is unsupported as of 2018, but <experimental/filesystem> is never going to be removed.

Here’s a simplified version that includes only the boilerplate to get wcout to work:

#include <iostream>
#include <locale>
#include <locale.h>

#ifndef MS_STDLIB_BUGS
#  if ( _MSC_VER || __MINGW32__ || __MSVCRT__ )
#    define MS_STDLIB_BUGS 1
#  else
#    define MS_STDLIB_BUGS 0
#  endif
#endif

#if MS_STDLIB_BUGS
#  include <io.h>
#  include <fcntl.h>
#endif

void init_locale(void)
{
#if MS_STDLIB_BUGS
  constexpr char cp_utf16le[] = ".1200";
  setlocale( LC_ALL, cp_utf16le );
  _setmode( _fileno(stdout), _O_WTEXT );
#else
  // The correct locale name may vary by OS, e.g., "en_US.utf8".
  constexpr char locale_name[] = "";
  setlocale( LC_ALL, locale_name );
  std::locale::global(std::locale(locale_name));
  std::wcin.imbue(std::locale())
  std::wcout.imbue(std::locale());
#endif
}

Upvotes: 15

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385088

C++ says:

[C++11: 27.4.1/3]: Mixing operations on corresponding wide- and narrow-character streams follows the same semantics as mixing such operations on FILEs, as specified in Amendment 1 of the ISO C standard.

And the referenced document says:

The definition of a stream was changed to include the concept of an orientation for both text and binary streams. After a stream is associated with a file, but before any operations are performed on the stream, the stream is without orientation. If a wide-character input or output function is applied to a stream without orientation, the stream becomes wide-oriented. Likewise, if a byte input or output operation is applied to a stream with orientation, the stream becomes byte-oriented. Thereafter, only the fwide() or freopen() functions can alter the orientation of a stream.

Byte input/output functions shall not be applied to a wide-oriented stream and wide-character input/output functions shall not be applied to a byte-oriented stream.

By my interpretation this means, in short, do not mix std::cout and std::wcout.

Upvotes: 11

Related Questions