Fadd
Fadd

Reputation: 824

How to add a UTF-8 BOM in Java?

I have a Java stored procedure which fetches record from the table using Resultset object and creates a CS Vfile.

BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
retBLOB.open(BLOB.MODE_READWRITE);
OutputStream bOut = retBLOB.setBinaryStream(0L);

ZipOutputStream zipOut = new ZipOutputStream(bOut);
PrintStream out = new PrintStream(zipOut,false,"UTF-8");
out.write('\ufeff');
out.flush();

zipOut.putNextEntry(new ZipEntry("filename.csv"));
while (rs.next()){
    out.print("\"" + rs.getString(i) + "\"");
    out.print(",");
}
out.flush();

zipOut.closeEntry();
zipOut.close();
retBLOB.close();

return retBLOB;

But the generated CSV file doesn't show the correct German character. Oracle database also has a NLS_CHARACTERSET value of UTF8.

Please suggest.

Upvotes: 31

Views: 102187

Answers (9)

Olavi Vaino
Olavi Vaino

Reputation: 451

Using StringBuilder

StringBuilder csv = new StringBuilder();    
csv.append('\ufeff');
csv.append(content);
csv.toString();

Upvotes: 1

timguy
timguy

Reputation: 2612

If you just want to

modify the same file

(without new file and delete old one as I had issues with that)

private void addBOM(File fileInput) throws IOException {
    try (RandomAccessFile file = new RandomAccessFile(fileInput, "rws")) {
        byte[] text = new byte[(int) file.length()];
        file.readFully(text);
        file.seek(0);
        byte[] bom = { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF };
        file.write(bom);
        file.write(text);
    }
}

Upvotes: 2

Stephen C
Stephen C

Reputation: 719641

PrintStream#print

I think that out.write('\ufeff'); should actually be out.print('\ufeff');, calling the java.io.PrintStream#print method.

According the javadoc, the write(int) method actually writes a byte ... without any character encoding. So out.write('\ufeff'); writes the byte 0xff. By contrast, the print(char) method encodes the character as one or bytes using the stream's encoding, and then writes those bytes.

As noted in section 23.8 of the Unicode 9 specification, the BOM for UTF-8 is EF BB BF. That sequence is what you get when using UTF-8 encoding on '\ufeff'. See: Why UTF-8 BOM bytes efbbbf can be replaced by \ufeff?.

Upvotes: 10

Silent
Silent

Reputation: 133

You Add This For First Of CSV String

String CSV = "";
byte[] BOM = {(byte) 0xEF,(byte) 0xBB,(byte) 0xBF};
CSV = new String(BOM) + CSV;

This Work For Me.

Upvotes: 8

David KELLER
David KELLER

Reputation: 656

Here a simple way to append BOM header on any file :

private static void appendBOM(File file) throws Exception {
    File bomFile = new File(file + ".bom");
    try (FileOutputStream output = new FileOutputStream(bomFile, true)) {
        byte[] bytes = FileUtils.readFileToByteArray(file);
        output.write('\ufeef'); // emits 0xef
        output.write('\ufebb'); // emits 0xbb
        output.write('\ufebf'); // emits 0xbf
        output.write(bytes);
        output.flush();
    }
    
    file.delete();
    bomFile.renameTo(file);
}

Upvotes: 0

astro
astro

Reputation: 853

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(...), StandardCharsets.UTF_8));
out.write('\ufeff');
out.write(...);

This correctly writes out 0xEF 0xBB 0xBF to the file, which is the UTF-8 representation of the BOM.

Upvotes: 85

Christopher Schultz
Christopher Schultz

Reputation: 20882

Just in case people are using PrintStreams, you need to do it a little differently. While a Writer will do some magic to convert a single byte into 3 bytes, a PrintStream requires all 3 bytes of the UTF-8 BOM individually:

    // Print utf-8 BOM
    PrintStream out = System.out;
    out.write('\ufeef'); // emits 0xef
    out.write('\ufebb'); // emits 0xbb
    out.write('\ufebf'); // emits 0xbf

Alternatively, you can use the hex values for those directly:

    PrintStream out = System.out;
    out.write(0xef); // emits 0xef
    out.write(0xbb); // emits 0xbb
    out.write(0xbf); // emits 0xbf

Upvotes: 16

Rocio
Rocio

Reputation: 11

In my case it works with the code:

PrintWriter out = new PrintWriter(new File(filePath), "UTF-8");
out.write(csvContent);
out.flush();
out.close();

Upvotes: 0

axtavt
axtavt

Reputation: 242786

To write a BOM in UTF-8 you need PrintStream.print(), not PrintStream.write().

Also if you want to have BOM in your csv file, I guess you need to print a BOM after putNextEntry().

Upvotes: 12

Related Questions