Reputation: 4267
For a C++ console application compiled with Visual Studio 2008 on English Windows (XP,Vista or 7). Is it possible to print out to the console and correctly display UTF-8 encoded Japanese using cout or wcout?
Upvotes: 30
Views: 61363
Reputation: 2230
This should work:
#include <cstdio>
#include <windows.h>
#pragma execution_character_set( "utf-8" )
int main()
{
SetConsoleOutputCP( 65001 ); // CP_UTF8
printf( "Testing unicode -- English -- Ελληνικά -- Español -- Русский. aäbcdefghijklmnoöpqrsßtuüvwxyz\n" );
}
Don't know if it affects anything, but source file is saved as Unicode (UTF-8 with signature) - Codepage 65001 at FILE -> Advanced Save Options ....
Project -> Properties -> Configuration Properties -> General -> Character Set is set to Use Unicode Character Set.
Some say you need to change console font to Lucida Console, but on my side it is displayed with both Consolas and Lucida Console.
Upvotes: 22
Reputation: 4675
You can use system
call:
#include <stdlib.h>
#include <stdio.h>
int main() {
system("chcp 65001");
printf("%s\n", "中文");
}
Upvotes: 2
Reputation: 7181
For anyone need to read UTF-8 from file and print to console can try wifstream
, even in visual studio debugger shows UTF-8 words correctly (I'm processing traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
Upvotes: 0
Reputation: 910
Just for additional information:
'ANSI' refers to windows-125x, used for win32 applications while 'OEM' refers to the code page used by console/MS-DOS applications.
Current active code-pages can be retrieved with functions GetOEMCP() and GetACP().
In order to output something correctly to the console, you should:
ensure the current OEM code page supports the characters you want to output
(if necessary, use SetConsoleOutputCP to set it properly)
convert the string from current ANSI code (win32) to the console OEM code page
Here are some utilities for doing so:
// Convert a UTF-16 string (16-bit) to an OEM string (8-bit)
#define UNICODEtoOEM(str) WCHARtoCHAR(str, CP_OEMCP)
// Convert an OEM string (8-bit) to a UTF-16 string (16-bit)
#define OEMtoUNICODE(str) CHARtoWCHAR(str, CP_OEMCP)
// Convert an ANSI string (8-bit) to a UTF-16 string (16-bit)
#define ANSItoUNICODE(str) CHARtoWCHAR(str, CP_ACP)
// Convert a UTF-16 string (16-bit) to an ANSI string (8-bit)
#define UNICODEtoANSI(str) WCHARtoCHAR(str, CP_ACP)
/* Convert a single/multi-byte string to a UTF-16 string (16-bit).
We take advantage of the MultiByteToWideChar function that allows to specify the charset of the input string.
*/
LPWSTR CHARtoWCHAR(LPSTR str, UINT codePage) {
size_t len = strlen(str) + 1;
int size_needed = MultiByteToWideChar(codePage, 0, str, len, NULL, 0);
LPWSTR wstr = (LPWSTR) LocalAlloc(LPTR, sizeof(WCHAR) * size_needed);
MultiByteToWideChar(codePage, 0, str, len, wstr, size_needed);
return wstr;
}
/* Convert a UTF-16 string (16-bit) to a single/multi-byte string.
We take advantage of the WideCharToMultiByte function that allows to specify the charset of the output string.
*/
LPSTR WCHARtoCHAR(LPWSTR wstr, UINT codePage) {
size_t len = wcslen(wstr) + 1;
int size_needed = WideCharToMultiByte(codePage, 0, wstr, len, NULL, 0, NULL, NULL);
LPSTR str = (LPSTR) LocalAlloc(LPTR, sizeof(CHAR) * size_needed );
WideCharToMultiByte(codePage, 0, wstr, len, str, size_needed, NULL, NULL);
return str;
}
Upvotes: 0
Reputation: 771
On app start console set to default OEM437 CP. I was trying to output Unicode text to stdout, where console was switch to UTF8 translation _setmode(_fileno(stdout), _O_U8TEXT); and still had no luck on the screen even with Lucida TT font. If console was redirected to file, correct UTF8 file were created.
Finally I was lucky. I have added single line "info.FontFamily = FF_DONTCARE;" and it is working now. Hope this help for you.
void SetLucidaFont()
{
HANDLE StdOut = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFOEX info;
memset(&info, 0, sizeof(CONSOLE_FONT_INFOEX));
info.cbSize = sizeof(CONSOLE_FONT_INFOEX); // prevents err=87 below
if (GetCurrentConsoleFontEx(StdOut, FALSE, &info))
{
info.FontFamily = FF_DONTCARE;
info.dwFontSize.X = 0; // leave X as zero
info.dwFontSize.Y = 14;
info.FontWeight = 400;
_tcscpy_s(info.FaceName, L"Lucida Console");
if (SetCurrentConsoleFontEx(StdOut, FALSE, &info))
{
}
}
}
Upvotes: 2
Reputation: 224189
Here's an article from MVP Michael Kaplan on how to correctly output UTF-16 through the console. You could convert your UTF-8 to UTF-16 and output that.
Upvotes: 8
Reputation: 4228
I've never actually tried setting the console code-page to UTF8 (not sure why it wouldn't work... the console can handle other multi-byte code-pages just fine), but there are a couple of functions to look up: SetConsoleCP and SetConsoleOutputCP.
You'll probably also need to make sure you're using a console font that is capable of displaying your characters. There's the SetCurrentConsoleFontEx function, but it's only available on Vista and above.
Hope that helps.
Upvotes: 4
Reputation: 217401
The Windows console uses the OEM code page by default to display output.
To change the code page to Unicode enter chcp 65001
in the console, or try to change the code page programmatically with SetConsoleOutputCP
.
Note that you probably have to change the font of the console to one that has glyphs in the unicode range.
Upvotes: 10
Reputation: 74282
In the console, enter chcp 65001
to change the code page to that of UTF-8.
Upvotes: 0