cdyer
cdyer

Reputation: 1259

ICU Byte Order Mark (BOM)

I am using ICU's ustdio functions to write a UnicodeString object to a file in a range of encodings, however it doesn't appear to prepend the BOM.

My Code:

void write_file(const char* filename, UnicodeString &str) {

    UFILE* f = u_fopen(filename, "w", NULL, "UTF-16 LE");
    u_file_write(str.getTerminatedBuffer(), str.length() + 1, f);
    u_fclose(f);
}

int _tmain(int argc, _TCHAR* argv[])
{
    UnicodeString str(L"ΠαρθένωνΗ");

    write_file("test.txt", str);

    return 0;
}

The file encoding does swap when I change LE to BE, however there is no BOM, the output file in a hex editor is:

A0 03 B1 03  C1 03 B8 03  AD 03 BD 03  C9 03 BD 03  97 03 00 00

NOTE: If I set the codepage as "UTF-16", there is a BOM, however once I manually specify the endianness it disappears.

Alternatively is there a way I could write the UnicodeString to a file with a BOM?

Upvotes: 3

Views: 1341

Answers (2)

Steven R. Loomis
Steven R. Loomis

Reputation: 4350

u_fputc(0x00feff,f);

will do it.

Upvotes: 2

Mark Ransom
Mark Ransom

Reputation: 308206

Just guessing, the "UTF-16 LE" and "UTF-16 BE" were intended to be used when the byte order was well specified and the BOM would not be necessary in the context where the file would be used.

You should be able to write your own BOM character '\ufeff' to the file.

Upvotes: 5

Related Questions