Reputation: 28767
I use Java 1.5 on an embedded Linux device and want to read a binary file with 2MB of int values. (now 4bytes Big Endian, but I can decide, the format)
Using DataInputStream
via BufferedInputStream
using dis.readInt()
), these 500 000 calls needs 17s to read, but the file read into one big byte buffer needs 5 seconds.
How can i read that file faster into one huge int[]?
The reading process should not use more than additionally 512 kb.
This code below using nio
is not faster than the readInt() approach from java io.
// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];
int cnt = 0;
RandomAccessFile aFile = new RandomAccessFile("filename", "r");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buf = ByteBuffer.allocate(512 * 1024);
int bytesRead = inChannel.read(buf); //read into buffer.
while (bytesRead != -1) {
buf.flip(); //make buffer ready for get()
while(buf.hasRemaining() && cnt < numInts){
// probably slow here since called 500 000 times
result[cnt] = buf.getInt();
cnt++;
}
buf.clear(); //make buffer ready for writing
bytesRead = inChannel.read(buf);
}
aFile.close();
inChannel.close();
Update: Evaluation of the answers:
On PC the Memory Map with IntBuffer approach was the fastest in my set up.
On the embedded device, without jit, the java.io DataiInputStream.readInt() was a bit faster (17s, vs 20s for the MemMap with IntBuffer)
Final Conclusion: Significant speed up is easier to achieve via Algorithmic change. (Smaller file for init)
Upvotes: 5
Views: 6742
Reputation: 45616
You can use IntBuffer
from nio package -> http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html
int[] intArray = new int[ 5000000 ];
IntBuffer intBuffer = IntBuffer.wrap( intArray );
...
Fill in the buffer, by making calls to inChannel.read(intBuffer)
.
Once the buffer is full, your intArray
will contain 500000 integers.
EDIT
After realizing that Channels only support ByteBuffer
.
// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];
// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );
// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );
// Fill in the buffer
while ( buf.hasRemaining( ) )
{
// Per EJP's suggestion check EOF condition
if( inChannel.read( buf ) == -1 )
{
// Hit EOF
throw new EOFException( );
}
}
buf.flip( );
// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );
// result will now contain all ints read from file
intBuffer.get( result );
Upvotes: 3
Reputation: 1663
I ran a fairly careful experiment using serialize/deserialize, DataInputStream vs ObjectInputStream, both based on ByteArrayInputStream to avoid IO effects. For a million ints, readObject was about 20msec, readInt was about 116. The serialization overhead on a million-int array was 27 bytes. This was on a 2013-ish MacBook Pro.
Having said that, object serialization is sort of evil, and you have to have written the data out with a Java program.
Upvotes: 2
Reputation: 2656
I don't know if this will be any faster than what Alexander provided, but you could try mapping the file.
try (FileInputStream stream = new FileInputStream(filename)) {
FileChannel inChannel = stream.getChannel();
ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
int[] result = new int[500000];
buffer.order( ByteOrder.BIG_ENDIAN );
IntBuffer intBuffer = buffer.asIntBuffer( );
intBuffer.get(result);
}
Upvotes: 4