Paul Dixon
Paul Dixon

Reputation: 4267

How do I print UTF-8 from c++ console application on Windows

For a C++ console application compiled with Visual Studio 2008 on English Windows (XP,Vista or 7). Is it possible to print out to the console and correctly display UTF-8 encoded Japanese using cout or wcout?

Upvotes: 30

Views: 61363

Answers (9)

Slaus
Slaus

Reputation: 2230

This should work:

#include <cstdio>
#include <windows.h>

#pragma execution_character_set( "utf-8" )

int main()
{
    SetConsoleOutputCP( 65001 ); // CP_UTF8
    printf( "Testing unicode -- English -- Ελληνικά -- Español -- Русский. aäbcdefghijklmnoöpqrsßtuüvwxyz\n" );
}

Don't know if it affects anything, but source file is saved as Unicode (UTF-8 with signature) - Codepage 65001 at FILE -> Advanced Save Options ....

Project -> Properties -> Configuration Properties -> General -> Character Set is set to Use Unicode Character Set.

Some say you need to change console font to Lucida Console, but on my side it is displayed with both Consolas and Lucida Console.

Upvotes: 22

BaiJiFeiLong
BaiJiFeiLong

Reputation: 4675

You can use system call:

#include <stdlib.h>
#include <stdio.h>

int main() {
    system("chcp 65001");
    printf("%s\n", "中文");
}

Upvotes: 2

yu yang Jian
yu yang Jian

Reputation: 7181

For anyone need to read UTF-8 from file and print to console can try wifstream, even in visual studio debugger shows UTF-8 words correctly (I'm processing traditional chinese), from this post:

#include <sstream>
#include <fstream>
#include <codecvt>

std::wstring readFile(const char* filename)
{
    std::wifstream wif(filename);
    wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
    std::wstringstream wss;
    wss << wif.rdbuf();
    return wss.str();
}
 
//  usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;

Upvotes: 0

Just for additional information:

'ANSI' refers to windows-125x, used for win32 applications while 'OEM' refers to the code page used by console/MS-DOS applications.
Current active code-pages can be retrieved with functions GetOEMCP() and GetACP().

In order to output something correctly to the console, you should:

  1. ensure the current OEM code page supports the characters you want to output
    (if necessary, use SetConsoleOutputCP to set it properly)

  2. convert the string from current ANSI code (win32) to the console OEM code page

Here are some utilities for doing so:

// Convert a UTF-16 string (16-bit) to an OEM string (8-bit) 
#define UNICODEtoOEM(str)   WCHARtoCHAR(str, CP_OEMCP)

// Convert an OEM string (8-bit) to a UTF-16 string (16-bit) 
#define OEMtoUNICODE(str)   CHARtoWCHAR(str, CP_OEMCP)

// Convert an ANSI string (8-bit) to a UTF-16 string (16-bit) 
#define ANSItoUNICODE(str)  CHARtoWCHAR(str, CP_ACP)

// Convert a UTF-16 string (16-bit) to an ANSI string (8-bit)
#define UNICODEtoANSI(str)  WCHARtoCHAR(str, CP_ACP)


/* Convert a single/multi-byte string to a UTF-16 string (16-bit).
 We take advantage of the MultiByteToWideChar function that allows to specify the charset of the input string.
*/
LPWSTR CHARtoWCHAR(LPSTR str, UINT codePage) {
    size_t len = strlen(str) + 1;
    int size_needed = MultiByteToWideChar(codePage, 0, str, len, NULL, 0);
    LPWSTR wstr = (LPWSTR) LocalAlloc(LPTR, sizeof(WCHAR) * size_needed);
    MultiByteToWideChar(codePage, 0, str, len, wstr, size_needed);
    return wstr;
}


/* Convert a UTF-16 string (16-bit) to a single/multi-byte string.
 We take advantage of the WideCharToMultiByte function that allows to specify the charset of the output string.
*/
LPSTR WCHARtoCHAR(LPWSTR wstr, UINT codePage) {
    size_t len = wcslen(wstr) + 1;
    int size_needed = WideCharToMultiByte(codePage, 0, wstr, len, NULL, 0, NULL, NULL);
    LPSTR str = (LPSTR) LocalAlloc(LPTR, sizeof(CHAR) * size_needed );
    WideCharToMultiByte(codePage, 0, wstr, len, str, size_needed, NULL, NULL);
    return str;
}

Upvotes: 0

adspx5
adspx5

Reputation: 771

On app start console set to default OEM437 CP. I was trying to output Unicode text to stdout, where console was switch to UTF8 translation _setmode(_fileno(stdout), _O_U8TEXT); and still had no luck on the screen even with Lucida TT font. If console was redirected to file, correct UTF8 file were created.

Finally I was lucky. I have added single line "info.FontFamily = FF_DONTCARE;" and it is working now. Hope this help for you.

void SetLucidaFont()
{
    HANDLE StdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    CONSOLE_FONT_INFOEX info;
    memset(&info, 0, sizeof(CONSOLE_FONT_INFOEX));
    info.cbSize = sizeof(CONSOLE_FONT_INFOEX);              // prevents err=87 below
    if (GetCurrentConsoleFontEx(StdOut, FALSE, &info))
    {
        info.FontFamily   = FF_DONTCARE;
        info.dwFontSize.X = 0;  // leave X as zero
        info.dwFontSize.Y = 14;
        info.FontWeight   = 400;
        _tcscpy_s(info.FaceName, L"Lucida Console");
        if (SetCurrentConsoleFontEx(StdOut, FALSE, &info))
        {
        }
    }
}

Upvotes: 2

sbi
sbi

Reputation: 224189

Here's an article from MVP Michael Kaplan on how to correctly output UTF-16 through the console. You could convert your UTF-8 to UTF-16 and output that.

Upvotes: 8

ijprest
ijprest

Reputation: 4228

I've never actually tried setting the console code-page to UTF8 (not sure why it wouldn't work... the console can handle other multi-byte code-pages just fine), but there are a couple of functions to look up: SetConsoleCP and SetConsoleOutputCP.

You'll probably also need to make sure you're using a console font that is capable of displaying your characters. There's the SetCurrentConsoleFontEx function, but it's only available on Vista and above.

Hope that helps.

Upvotes: 4

dtb
dtb

Reputation: 217401

The Windows console uses the OEM code page by default to display output.

To change the code page to Unicode enter chcp 65001 in the console, or try to change the code page programmatically with SetConsoleOutputCP.

Note that you probably have to change the font of the console to one that has glyphs in the unicode range.

Upvotes: 10

Alan Haggai Alavi
Alan Haggai Alavi

Reputation: 74282

In the console, enter chcp 65001 to change the code page to that of UTF-8.

Upvotes: 0

Related Questions