Reputation: 433
I have to check a file's encoding before reading it. To check the encoding, I use this method:
try {
CharsetDecoder decoder= Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
final InputStreamReader input = new InputStreamReader(is, decoder);
int data = input.read();
while(data != -1){
data = input.read();
}
input.close();
} catch (MalformedInputException e) {
LOGGER.error(The file encoding is wrong!");
throw new MalformedInputException(Math.toIntExact(file.length()));
}
}
And here is the code that calls it:
InputStream is = new FileInputStream(file);
checkFileEncoding(is);
List<MyObject> list = newArrayList();
try(CSVReader reader = new CSVReader(new InputStreamReader(is), ';')) {
list = reader.readAll().stream()
.skip(1) //
.map(myObjectMap)
.filter(o -> o != null)
.collect(toList());
}
The thing is, my list is empty when I call checkFileEncoding
before. I think it's because I read my file twice. How should I do?
Upvotes: 2
Views: 310
Reputation: 4549
try Guess Encoding library.
Charset charset = CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8);
This should return you the expected result.
I tried it against a HTML
file and the result was US-ASCII
as charset.
you may try Any23 library
Charset charset = Charset.forName(new TikaEncodingDetector().guessEncoding(new FileInputStream(file)));
Upvotes: 0
Reputation: 1056
final InputStreamReader input = new InputStreamReader(is, decoder);
Your InputStreamReader will read all the data from the input stream. This means there is no data available anymore. In addition you already close it.
You will need to create a InputStream two times. One time to test the character set and one more time to actually read the data.
So change
InputStream is = new FileInputStream(file);
checkFileEncoding(is);
to
InputStream is = new FileInputStream(file);
checkFileEncoding(is);
is = new FileInputStream(file);
Also after the
try(CSVReader reader ..
..
}
add
is.close();
Upvotes: 1