Mohamed Benmahdjoub
Mohamed Benmahdjoub

Reputation: 1290

How to Open File in UTF-8 and Write in Another File in UTF-16

How can I open a file in UTF-8 and write to another file in UTF-16?

I need an example because I'm having issues with some characters like 'é' and 'a'.

When writing "médic", I find in the file written "m@#dic".

Upvotes: 3

Views: 183

Answers (3)

Craig Schmidt
Craig Schmidt

Reputation: 81

Adding to what fge said in his comment, I don't think changing the encoding when you write it out is your problem. My guess is that the file that you're reading isn't in UTF-8. Open that file with an editor like PsPad in hexmode and look at the first two or three bytes of the file for the byte order mark (BOM). If it has the UTF-8 BOM, then I'm wrong. If it doesn't have a BOM at all then the file is probably in the OS's default encoding and not UTF-8. If there is no BOM then you can usually verify what encoding by looking at a character outside of the ASCII range and seeing what the bytes actually are.

Upvotes: 0

fge
fge

Reputation: 121830

Do this:

try (
    final BufferedReader reader = Files.newBufferedReader(srcpath,
        StandardCharsets.UTF_8);
    final BufferedWriter writer = Files.newBufferedWriter(dstpath,
        StandardCharsets.UTF_16BE);
) {
    final char[] buf = new char[4096];
    int nrChars;
    while ((nrChars = reader.read(buf)) != -1)
        writer.write(buf, 0, nrChars);
    writer.flush();
}

NOTE: chosen big endian UTF-16. You didn't tell which one you wanted. If you want little endian, use UTF_16LE instead.

Also, if you want to skip the bom, just:

reader.read();

before looping for writing chars. The BOM is a single code point which happens to be in the BMP, so this will work.

Upvotes: 3

aioobe
aioobe

Reputation: 421280

You can create a reader as follows:

InputStream is = new FileInputStream(inputFile);
InputStreamReader in = new InputStreamReader(is, "UTF-8");

and a writer as follows:

OutputStream os = new FileOutputStream(outputFile);
OutputStreamWriter out = new OutputStreamWriter(os, "UTF-16");

Upvotes: 4

Related Questions