Reputation: 89
Could you tell me please, what to choose if I need to read from very big(~1Gb) .txt file which contains some unformated data(mostly String text) in UTF-8: Scanner, BufferedReader or mb something else even better(probably from NIO or side libraries)?
Upvotes: 4
Views: 95
Reputation: 718758
It depends on what you are trying to do with the file.
For example, ask yourself:
Once you have figured out that side of things, and one of the alternatives you are considering for reading the file is likely to come out as a better match than the others.
(And we certainly can't give you sound / balanced advice on the best way to read the data if we don't understand what you are intending to do with it.)
My advice is to think about how you are processing data before you spend your time on efficiency concerns. There is a good chance that the choice of technique / API for reading the file won't be what is limiting your application's overall performance.
Upvotes: 3
Reputation: 24508
The size of the file does not matter for correctness (as long as you have enough ram to store the intermediate data), but it does matter in terms of performance. This website explains how to read UTF-8 in Java. It uses InputStreamReader:
try {
Reader reader = new InputStreamReader(
new FileInputStream(args[0]),"UTF-8");
BufferedReader fin = new BufferedReader(reader);
String line;
while ((line = fin.readLine())!=null) {
// do something with line
}
fin.close();
} catch (IOException e) {
e.printStackTrace();
}
Note that he reads line by line. For large files, IO performance is important, so you might instead want to read the data in chunks of 4k or 8k bytes instead. Note though that that might break up characters (since UTF-8 characters can have one or more bytes, there is no way of telling in advance if a character ends exactly on a chunk boundary).
In that case, you either want to treat the text as data until you finished reading, or you must go through all read characters to find out, if you must append the last byte to the next chunk before processing it.
Upvotes: 2