dardy
dardy

Reputation: 433

Use InputStreamReader twice

I have to check a file's encoding before reading it. To check the encoding, I use this method:

        try {
            CharsetDecoder decoder= Charset.forName("UTF-8").newDecoder();
            decoder.onMalformedInput(CodingErrorAction.REPORT);
            decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
            final InputStreamReader input = new InputStreamReader(is, decoder);
            int data = input.read();
            while(data != -1){
                data = input.read();
            }
            input.close();
        } catch (MalformedInputException e) {
            LOGGER.error(The file encoding is wrong!");
            throw new MalformedInputException(Math.toIntExact(file.length()));
        }
    }

And here is the code that calls it:

    InputStream is = new FileInputStream(file);
    checkFileEncoding(is);

    List<MyObject> list = newArrayList();
    try(CSVReader reader = new CSVReader(new InputStreamReader(is), ';')) {
        list =  reader.readAll().stream()
                .skip(1) // 
                .map(myObjectMap)
                .filter(o -> o != null)
                .collect(toList());
    }

The thing is, my list is empty when I call checkFileEncoding before. I think it's because I read my file twice. How should I do?

Upvotes: 2

Views: 310

Answers (2)

Pankaj Nimgade
Pankaj Nimgade

Reputation: 4549

try Guess Encoding library.

Charset charset = CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8);

This should return you the expected result.

I tried it against a HTML file and the result was US-ASCII as charset.

you may try Any23 library

Charset charset = Charset.forName(new TikaEncodingDetector().guessEncoding(new FileInputStream(file)));

Upvotes: 0

user254948
user254948

Reputation: 1056

final InputStreamReader input = new InputStreamReader(is, decoder);

Your InputStreamReader will read all the data from the input stream. This means there is no data available anymore. In addition you already close it.

You will need to create a InputStream two times. One time to test the character set and one more time to actually read the data.

So change

InputStream is = new FileInputStream(file);
checkFileEncoding(is);

to

InputStream is = new FileInputStream(file);
checkFileEncoding(is);
is = new FileInputStream(file);

Also after the

try(CSVReader reader ..
..
}

add

is.close();

Upvotes: 1

Related Questions