Reputation: 2937
I'm trying to read a binary file (16 MB) in which I have only integers coded on 16 bits. So for that, I used chunks of 1 MB which gives me an array of bytes. For my own needs, I convert this byte array to a short array with the following function convert but reading this file with a buffer and convert it into a short array take me 5 seconds, is it a faster way than my solution ?
def convert(in: Array[Byte]): Array[Short] = in.grouped(2).map {
case Array(one) => (one << 8 | (0 toByte)).toShort
case Array(hi, lo) => (hi << 8 | lo).toShort
} .toArray
val startTime = System.nanoTime()
val file = new RandomAccessFile("foo","r")
val defaultBlockSize = 1 * 1024 * 1024
val byteBuffer = new Array[Byte](defaultBlockSize)
val chunkNums = (file.length / defaultBlockSize).toInt
for (i <- 1 to chunkNums) {
val seek = (i - 1) * defaultBlockSize
file.seek(seek)
file.read(byteBuffer)
val s = convert(byteBuffer)
println(byteBuffer size)
}
val stopTime = System.nanoTime()
println("Perf of = " + ((stopTime - startTime) / 1000000000.0) + " for a duration of " + duration + " s")
Upvotes: 0
Views: 2163
Reputation: 167901
16 MB easily fits in memory unless you're running this on a feature phone or something. No need to chunk it and make the logic harder.
Just gulp the whole file at once with java.nio.files.Files.readAllBytes
:
val buffer = java.nio.files.Files.readAllBytes(myfile.toPath)
assuming you are not stuck with Java 1.6. (If you are stuck with Java 1.6, pre-allocate your buffer size using myfile.size
, and use read
on a FileInputStream
to get it all in one go. It's not much harder, just don't forget to close it!)
Then if you don't want to convert it yourself, you can
val bb = java.nio.ByteBuffer.wrap(buffer)
bb.order(java.nio.ByteOrder.nativeOrder)
val shorts = new Array[Short](buffer.length/2)
bb.asShortBuffer.get(shorts)
And you're done.
Note that this is all Java stuff; there's nothing Scala-specific here save the syntax.
If you're wondering why this is so much faster than your code, it's because grouped(2)
boxes the bytes and places them in an array. That's three allocations for every short you want! You can do it yourself by indexing the array directly, and that will be fast, but why would you want to when ByteBuffer
and friends do exactly what you need already?
If you really really care about that last (odd) byte, then you can use (buffer.length + 1)/2
for the size of shorts
, and tack on a if ((buffer.length) & 1 == 1) shorts(shorts.length-1) = ((bb.get&0xFF) << 8).toShort
to grab the last byte.
Upvotes: 3
Reputation: 17933
A couple of issues pop out:
If byteBuffer
is always going to be 1024*1024 size then the case Array(one)
in convert
will never actually be used and therefore pattern matching is unnecessary.
Also, you can avoid the for loop with a tail recursive function. After the val byteBuffer = ...
line you can replace the chunkNums and for loop with:
@scala.annotation.tailrec
def readAndConvert(b: List[Array[Short]], file : RandomAccessFile) : List[Array[Short]] = {
if(file.read(byteBuffer) < 0)
b
else {
file.skipBytes(1024*1024)
readAndConvert(b.+:(convert(byteBuffer)), file)
}
}
val sValues = readAndConvert(List.empty[Array[Short]], file)
Note: because list preppending is much faster than appending the above loop gets you the converted value in reverse order from the reading order in the file.
Upvotes: 0