Reputation: 1396
I am developing a simple console
application with Visual Studio 2013
int _tmain(int argc, _TCHAR* argv[])
{
std::wstring name;
std::wcout << L"Enter your name: ";
std::wcin >> name;
std::wcout << L"Hello, " << name << std::endl;
system("pause");
return 0;
}
If I enter as input Ángel
the application works well and the output is
Hello, Ángel
the problem is that If i put a breakpoint on
std::wcout << L"Hello, " << name << std::endl;
the Visual studio debugger shows
+ name L"µngel" std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >
Although the output in console is correct in other part of the program I have a call to win32api
function CopyFileW()
and it always fails because the path has the substring Ángel
and the substring passed to function is transformed to µngel
Upvotes: 1
Views: 233
Reputation: 98328
The problem is that Windows consoles are broken by default.
The problem arises from Windows using a different 8-bit codepage in console application than in Windows applications. By default, in Western Windows versions, the default 8-bit codepage (called ANSI) is Windows-1252, while the console 8-bit codepage (called OEM) is CP850.
Since your program doesn't know if it is reading from console or from a redirected file, it simply assumes ANSI input. But when you type Á
, it is actually the codepoint from CP850: 0xB5
. It is then interpreted using Windows-1252 as µ
, that is Unicode characters U+00B5. The funny thing is that when you print it into the console, the inverse transformation happens, and you see a Á
again. Two wrongs make one right!
But when you want to use that characters in a non-console context, it is actually a µ
.
You may think that you can convert from OEM to ANSI and then from ANSI to Unicode, and that would seem to work... until you run your program as:
c:\> myprogram < input.txt
And you wrote that input.txt
using notepad, so it is using ANSI, and then you are doing a conversion you do not need.
You say then that you could detect if you are reading the actual console or a redirection and do the OEM to ANSI conversion only when there is no redirect... until you do:
c:\> echo Ángel | myprogram
And you are doing it wrong again!
There are a lot of alternatives, but none of them works completely fine. At least you should use a Unicode font and then a more normal codepage. Something like chcp 1252
to change the OEM codepage to match the ANSI one. You can even configure it by default with a bit of registry foo:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP=1252
Upvotes: 4