mins
mins

Reputation: 7524

Preventing Unicode Byte Order Mark to be written in the middle of a file

This code writes two strings in a file channel

final byte[] title = "Title: ".getBytes("UTF-16");
final byte[] body = "This is a string.".getBytes("UTF-16");
ByteBuffer titlebuf = ByteBuffer.wrap(title);
ByteBuffer bodybuf = ByteBuffer.wrap(body);
FileChannel fc = FileChannel.open(p, READ, WRITE, TRUNCATE_EXISTING);
fc.position(title.length); // second string written first, but not relevant to the problem
while (bodybuf.hasRemaining()) fc.write(bodybuf);
fc.position(0);
while (titlebuf.hasRemaining()) fc.write(titlebuf);

Each string is prefixed by a BOM.

[Title: ?T]  *254 255* 0 84 0 105 0 116 0 108 0 101 58 0 32 *254 255* 0 84

While this is ok to have one at the beginning of the file, this creates a problem when there is one in the middle of the stream.

How can I prevent this to happen?

Upvotes: 0

Views: 594

Answers (1)

Wajdy Essam
Wajdy Essam

Reputation: 4340

the BOM bytes is inserted when you call get UTF-16 with BOM:

final byte[] title = "Title: ".getBytes("UTF-16");

check the title.length and you will find it contains additional 2 bytes for BOM marker

so you could process these arrays and remove the BOM from it before wrapp into ByteBuffer, or you can ignore it when you write ByteBuffer to file

other solution, you can use UTF-16 Little/BIG Endianness which will not write BOM marker:

final byte[] title = "Title: ".getBytes("UTF-16LE"); 

or you can use UTF-8 if UTF-16 is not required:

final byte[] title = "Title: ".getBytes("UTF-8");

Upvotes: 2

Related Questions