nullByteMe
nullByteMe

Reputation: 6391

What is the proper way to handle large files?

How should a large file be handled in java when you need to run the bytes through a variety of methods?

The way I was doing it before is like this:

private byte[] inputStreamToByteArray(InputStream inputStream) {
   BufferedInputStream bis = BufferedInputStream(inputStream);
   ByteArrayOutputStream baos = new ByteArrayOutputStream();

   byte[] buffer = new byte[8192];

   int nRead;
   while((nRead = bis.read(buffer)) != -1) {
      baos.write(buffer, 0, nRead);
   } 

   return baos.toByteArray();
}

I get a java out of memory error doing it this way because my byte array gets too large.

So I tried stringing together streams, but I'm not sure if that is the proper way to do it because I don't understand enough about streams.

Should large files be handled using chunks from a byte array or by passing around inputstreams?

Upvotes: 2

Views: 3833

Answers (2)

Glen Best
Glen Best

Reputation: 23105

Either:

  1. Process file via Memory Mapped Files. Handles at least up to 2GB size - if you want to dedicate that much memory! Integrates with operating system native IO threads and memory buffers to increase performance a little.

     java.nio.MappedByteBuffer buff = file.getChannel.map();
    

    Then access various parts of the buffer - they will be paged into java memory in turn, so still some IO chunking occuring. But logically, to your program it looks as if it's processing entire file (with some abstraction leakage in I/O performance during paging).

  2. Process chunks as you read them - instead of appending to an ever-growing ByteArrayOutputStream. In order to process chunks: read large enough chunks so they have meaning to your program. OR aggregate the pieces only to the point that they have meaning and can be processed and discard them before the next read.

Often (2) performs well, but (1) can perform well and is occasionally simpler but is more memory-expensive.

See also: Most Robust way of reading a file or stream using Java (To prevent DoS attacks)

:)

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533492

Should large files be handled using chunks from a byte array or by passing around inputstreams?

Large files should be read from a file in chunk of say 8192 bytes, exactly as you do in the example. Instead of copying the data to an array and processing the array, just process the data as you read it.

Upvotes: 3

Related Questions