IAmYourFaja
IAmYourFaja

Reputation: 56944

How to read a file in Java with specific character encoding?

I am trying to read a file in as either UTF-8 or Windows-1252 depending on the output of this method:

public Charset getCorrectCharsetToApply() {
    // Returns a Charset for either UTF-8 or Windows-1252.
}

So far, I have:

String fileName = getFileNameToReadFromUserInput();
InputStream is = new ByteArrayInputStream(fileName.getBytes());
InputStreamReader isr = new InputStreamReader(is, getCorrectCharsetToApply());
BufferedReader buffReader = new BufferedReader(isr);

The problem I'm having is converting the BufferedReader instance to a FileReader.

Furthermore:

Thanks in advance!

Upvotes: 29

Views: 76338

Answers (4)

queeg
queeg

Reputation: 9453

Java does not die if you pick the wrong charset. The code is perfectly fine like this, but you may want to read the file content instead of the file name:

String fileName = getFileNameToReadFromUserInput();
InputStream is = new ByteArrayInputStream(fileName.getBytes());
InputStreamReader isr = new InputStreamReader(is, getCorrectCharsetToApply());
BufferedReader buffReader = new BufferedReader(isr);

Just consider what exactly will happen if you pick the wrong encoding: Java will throw an Exception. All you need to do is handle this situation, then you can iterate over all the charsets Java knows about:

void processFile(String fileName, Charset charset) {
    try (
        InputStream is = new FileInputStream(fileName);
        InputStreamReader isr = new InputStreamReader(is, charset);
        BufferedReader buffReader = new BufferedReader(isr);
    ) {
        ... (do whatever you have to do here by reading buffReader)
    }
}

void main(String[]) {
    String fileName = getFileNameToReadFromUserInput();

    // try all the known charsets
    for (Charset charset: Charset.availableCharsets().values()) {
        try {
            processFile(fileName, charset);
            // still here? then it was successful
            return;
        } catch (Exception e) {
            System.out.println("Reading " + fileName + " with " + charset + " resulted in");
            e.printStackTrace();
            System.out.println("Trying next...");
        }
    }
}

The try-with-resources block inside processFile ensures all files/streams/buffers get closed when they are no longer needed. The try/catch block in the main method ensures the program does not die in case we picked the wrong charset.

Upvotes: 0

dlauzon
dlauzon

Reputation: 1321

With Java 7+, you can create the Reader in one line:

BufferedReader buffReader = Files.newBufferedReader(Paths.get(fileName), getCorrectCharsetToApply());

Upvotes: 6

shadowmatter
shadowmatter

Reputation: 1382

Note that if you are using Google Guava, you can use Files.newReader:

final BufferedReader reader =
        Files.newReader(new File(filename), getCorrectCharsetToApply());

Upvotes: 4

Dennis Meng
Dennis Meng

Reputation: 5187

So, first, as a heads up, do realize that fileName.getBytes() as you have there gets the bytes of the filename, not the file itself.

Second, reading inside the docs of FileReader:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

So, sounds like FileReader actually isn't the way to go. If we take the advice in the docs, then you should just change your code to have:

String fileName = getFileNameToReadFromUserInput();
FileInputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, getCorrectCharsetToApply());
BufferedReader buffReader = new BufferedReader(isr);

and not try to make a FileReader at all.

Upvotes: 35

Related Questions