deHaar
deHaar

Reputation: 18568

Write a text file encoded in UTF-8 with a BOM through java.nio

I have to write the output of a database query to a csv file.

Unfortunately, many people in my company are not able to use a nice editor like Notepad++ and keep opening csv files with Excel.

When I write a text/csv file using java.nio like this

public static void main(String[] args) {
    Path path = Paths.get("U:\\temp\\TestOutput\\csv_file.csv");
    List<String> lines = Arrays.asList("Übernahme", "Außendarstellung", "€", "@", "UTF-8?");

    try {
        Files.write(path, lines, StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

the file gets created successfully and is encoded in UTF-8.

Now the problem is the missing BOM in that file.

There is no BOM (Notepad++ bottom-right encoding label shows UTF-8) which is no problem for Notepad++

enter image description here

but obviously it is for Excel

enter image description here

and when I use Notepad++'s option Encoding > Convert to UTF-8-BOM, save & close it and open the file in Excel afterwards, it correctly displays all the values, no encoding issues are left.

That leads to the following question:

Can I force java.nio.file.Files.write(...) to add a BOM when using StandardCharsets.UTF-8 or is there any other way in java.nio to achieve the desired encoding?

Upvotes: 3

Views: 3418

Answers (2)

Zafar
Zafar

Reputation: 31

I will suggest instead of using "\uFEFF" + "Übernahme", use as "\uFEFF", "Übernahme". Benefit of doing this is, it will not change the actual data of the file. In the case of using opencsv API, you are having the headers in first line and data from second line, then adding "," after BOM character, you can have the same header intact, without any prefix to header. If the header got updated then you have to update the code for the data and header mapping too. If you are using the properties file for header and data mapping then you have to just add an extra mapping for "\uFEFF" as "\uFEFF"=TEMP there.

Upvotes: 0

Pavel Smirnov
Pavel Smirnov

Reputation: 4799

As far as I know, there's no direct way in the standard Java NIO library to write text files in UTF-8 with BOM format.

But that's not a problem, since BOM is nothing but a special character at the start of a text stream represented as \uFEFF. Just add it manually to the CSV file, f.e.:

List<String> lines = 
    Arrays.asList("\uFEFF" + "Übernahme", "Außendarstellung", "€", "@", "UTF-8?");
        ...

Upvotes: 3

Related Questions