Reputation: 193
What encoding does Windows use for command line parameters passed to programs started in a cmd.exe window?
The encoding of command line parameters doesn't seem to be affected by the console code page set using chcp
(I set it to UTF-8, code page 65001 and use the Lucida Console font.)
If I paste an EN DASH, encoded as hex E28093, from a UTF-8 file into a command line, it is displayed correctly in the cmd.exe window. However, it seems to be translated to a hex 96 (an ANSI representation) when it is passed to the program. If I paste Cyrillic characters into a command line, they are also displayed correctly, but appear in the program as question marks (hex 3F.)
If I copy a command line and paste it into a text file, the resulting file is UTF-8; it contains the same encoding of the EN DASH and Cyrillic characters as the source file.
It appears the characters pasted into the cmd.exe window are captured and displayed using the code page selected with chcp
, but some ANSI code page is used to translate the characters into a different encoding before passing them as parameters to a program. Characters that cannot be converted apparently are silently converted to question marks.
So, if I want to correctly handle command line parameters in a program, I need to know exactly what the encoding of the parameters is. For example, if I wish to compare command line parameters with known UTF-8 data read from a file, I need to convert the parameters from the correct encoding to UTF-8. Thanks.
Upvotes: 6
Views: 2122
Reputation: 101764
If your goal is to compare Unicode characters then you should call GetCommandLineW
in your program (or use wmain
so that argv
uses wchar_t) and then convert this UTF-16LE command line string to UTF-8 or vice versa.
GetCommandLineA
probably converts the Unicode source string with CP_ACP.
Upvotes: 3