Synetech
Synetech

Reputation: 9925

Passing Unicode command-line arguments to a console app

I’m trying to write a console application that can accept filename arguments and want it to be able to handle Unicode filenames. The problem is that I cannot figure out how to test it.

How can you pass Unicode arguments to a console app?

I tried creating a Unicode batch file that calls the program, passing it some Unicode characters, but it doesn’t work; the command-prompt can’t launch the program at all because it gets tripped up on the null-characters in its filename. I tried changing the code page to 65001 and Alt-typing a Unicode character at the command-line, but that didn’t work either.

Below is a sample program. I’m trying to find a way to get the following output:

C:\> unicodeargtest Foobar
46, 0, 6f, 0


// UnicodeArgTest.cpp
#define UNICODE
#include <tchar.h>
#include <stdio.h>
int wmain (int argc, wchar_t**argv) {
    printf("%x, %x, %x, %x\n", argv[1][0], argv[1][1], argv[1][2], argv[1][3]);
}

Upvotes: 3

Views: 4521

Answers (2)

Synetech
Synetech

Reputation: 9925

Oh blerg! It happened again. I come from an assembler background, so occasionally some C++ stuff trips me up. One thing that I keep forgetting is how in C++, the compiler takes the liberty of automatically compensating for type sizes when computing indexes, pointers, and such.

For example:

DWORD dwa[4] = {1,2,3,4};
//dwa[2] references the third DWORD in the array (i.e., the ninth BYTE),
//NOT the second BYTE in the array

or

struct EGS {
    char  str[5];
    int   num;
};
EGS   eg = {0};
EGS* peg = &eg;
peg++;
//peg is incremented by a whole EGS’ worth of bytes, NOT just 1
//for EGS, it is increased by 12 (5+4=9, rounded to the nearest 4, equals 12)

In this case, because the arguments are being interpreted as wide (2-byte) characters, argv[1][1] isn’t a null-character, it is the second Unicode character.

Using the program as is and passing a Unicode character, I get this:

C:\>unicodeargtest ‽‽‽‽
203d, 203d, 203d, 203d

I simply pasted the interrobangs into the command-prompt. In my normal command-prompt mode (using Raster Fonts and code-page 437), they display as ? instead of , but it still gives the same results.


By casting the arguments to char or BYTE as so:

printf("%x, %x, %x, %x\n",
    ((BYTE*)(argv[1]))[0], ((BYTE*)(argv[1]))[1],
    ((BYTE*)(argv[1]))[2], ((BYTE*)(argv[1]))[3]
);

I get the expected results:

C:\>unicodeargtest ‽‽‽‽
3d, 20, 3d, 20

C:\>unicodeargtest Foobar
46, 0, 6f, 0

Pasting Unicode characters works, but using a batch file still doesn’t. A Unicode one still has the problem with the program’s filename being interpreted incorrectly due to the null-characters, and saving it as UTF-8 causes it to not run at all.

Upvotes: 2

Harry Johnston
Harry Johnston

Reputation: 36348

Drag-and-drop should do the trick. In Explorer, drag the file whose name you want to pass as an argument onto the test executable. (You might first want to change the executable so that it waits before exiting.)

Upvotes: 1

Related Questions