Reputation: 3850
Java 8's stream API has been convenient and gained popularity. For file I/O, I found that two API's are provided to generate stream output: Files.lines(path)
, and bufferedReader.lines()
;
I did not find a stream API which provide Stream of fixed-sized buffers for reading files, though.
My concern is: in case of files with very long line, e.g. a 4GB
file with only a single line, aren't these line-based API very inefficient?
The line-based reader will need at least 4GB
memory to keep that line.
Compared to a fix-sized buffer reader (fileInputStream.read(byte[] b, int off, int len)
), which takes at most the buffer size of memory.
If the above concern is true, are there any Stream API for file i/o API which are more efficient?
Upvotes: 0
Views: 2665
Reputation: 298203
It depends on how you want to process the data, which method of delivery is appropriate. So if your processing requires processing the data line by line, there is no way around doing it that way.
If you really want fixed size chunks of character data, you can using the following method(s):
public static Stream<String> chunks(Path path, int chunkSize) throws IOException {
return chunks(path, chunkSize, StandardCharsets.UTF_8);
}
public static Stream<String> chunks(Path path, int chunkSize, Charset cs)
throws IOException {
Objects.requireNonNull(path);
Objects.requireNonNull(cs);
if(chunkSize<=0) throw new IllegalArgumentException();
CharBuffer cb = CharBuffer.allocate(chunkSize);
BufferedReader r = Files.newBufferedReader(path, cs);
return StreamSupport.stream(
new Spliterators.AbstractSpliterator<String>(
Files.size(path)/chunkSize, Spliterator.ORDERED|Spliterator.NONNULL) {
@Override public boolean tryAdvance(Consumer<? super String> action) {
try { do {} while(cb.hasRemaining() && r.read(cb)>0); }
catch (IOException ex) { throw new UncheckedIOException(ex); }
if(cb.position()==0) return false;
action.accept(cb.flip().toString());
return true;
}
}, false).onClose(() -> {
try { r.close(); } catch(IOException ex) { throw new UncheckedIOException(ex); }
});
}
but I wouldn’t be surprised if your next question is “how can I merge adjacent stream elements”, as these fixed sized chunks are rarely the natural data unit to your actual task.
More than often, the subsequent step is to perform pattern matching within the contents and in this case, it’s better to use Scanner
in the first place, which is capable of performing pattern matching while streaming the data, which can be done efficiently as the regex engine tells whether buffering more data could change the outcome of a match operation (see hitEnd()
and requireEnd()
). Unfortunately, generating a stream of matches from a Scanner
has only been added in Java 9, but see this answer for a back-port of that feature to Java 8.
Upvotes: 2
Reputation: 73558
If you have a 4GB
text file with a single line, and you're processing it "line by line", then you've made a serious error in your programming by not understanding the data you're working with.
They're convenience methods for when you need to do simple work with data like CSV or other such format, and the line sizes are manageable.
A real life example of a 4GB
text file with a single line would be an XML file without line breaks. You would use a streaming XML parser to read that, not roll your own solution that reads line by line.
Upvotes: 5