fhucho
fhucho

Reputation: 34560

What is the fastest way to load a big 2D int array from a file?

I'm loading a 2D array from file, it's 15,000,000 * 3 ints big (it will be 40,000,000 * 3 eventually). Right now, I use dataInputStream.readInt() to sequentially read the ints. It takes ~15 seconds. Can I make it significantly (at least 3x) faster or is this about as fast as I can get?

Upvotes: 4

Views: 698

Answers (2)

Adam Stelmaszczyk
Adam Stelmaszczyk

Reputation: 19857

Yes, you can. From benchmark of 13 different ways of reading files:

If you have to pick the fastest approach, it would be one of these:

  • FileChannel with a MappedByteBuffer and array reads.
  • FileChannel with a direct ByteBuffer and array reads.
  • FileChannel with a wrapped array ByteBuffer and direct array access.

For the best Java read performance, there are 4 things to remember:

  • Minimize I/O operations by reading an array at a time, not a byte at a time. An 8 KB array is a good size (that's why it's a default value for BufferedInputStream).
  • Minimize method calls by getting data an array at a time, not a byte at a time. Use array indexing to get at bytes in the array.
  • Minimize thread synchronization locks if you don't need thread safety. Either make fewer method calls to a thread-safe class, or use a non-thread-safe class like FileChannel and MappedByteBuffer.
  • Minimize data copying between the JVM/OS, internal buffers, and application arrays. Use FileChannel with memory mapping, or a direct or wrapped array ByteBuffer.

Upvotes: 7

fge
fge

Reputation: 121860

Map your file into memory!

Java 7 code:

FileChannel channel = FileChannel.open(Paths.get("/path/to/file"), 
    StandardOpenOption.READ);
ByteBuffer buf = channel.map(0, channel.size(),
    FileChannel.MapMode.READ_ONLY);

// use buf

See here for more details.

If you use Java 6, you'll have to:

RandomAccessFile file = new RandomAccessFile("/path/to/file", "r");
FileChannel channel = file.getChannel();
// same thing to obtain buf

You can even use .asIntBuffer() on the buffer if you want. And you can read only what you actually need to read, when you need to read it. And it does not impact your heap.

Upvotes: 7

Related Questions