Efficiently way to read binary files in scala

Question

I'm trying to read a binary file (16 MB) in which I have only integers coded on 16 bits. So for that, I used chunks of 1 MB which gives me an array of bytes. For my own needs, I convert this byte array to a short array with the following function convert but reading this file with a buffer and convert it into a short array take me 5 seconds, is it a faster way than my solution ?

 def convert(in: Array[Byte]): Array[Short] = in.grouped(2).map {
    case Array(one) => (one << 8 | (0 toByte)).toShort
    case Array(hi, lo) => (hi << 8 | lo).toShort
  } .toArray

  val startTime = System.nanoTime()

val file = new RandomAccessFile("foo","r")
val defaultBlockSize = 1 * 1024 * 1024
    val byteBuffer = new Array[Byte](defaultBlockSize)
    val chunkNums = (file.length / defaultBlockSize).toInt
    for (i <- 1 to chunkNums) {
      val seek = (i - 1) * defaultBlockSize
      file.seek(seek)
      file.read(byteBuffer)
      val s = convert(byteBuffer)
      println(byteBuffer size)
    }

val stopTime = System.nanoTime()
  println("Perf of = " + ((stopTime - startTime) / 1000000000.0) + " for a duration of " + duration + " s")

Rex Kerr · Accepted Answer

16 MB easily fits in memory unless you're running this on a feature phone or something. No need to chunk it and make the logic harder.

Just gulp the whole file at once with java.nio.files.Files.readAllBytes:

val buffer = java.nio.files.Files.readAllBytes(myfile.toPath)

assuming you are not stuck with Java 1.6. (If you are stuck with Java 1.6, pre-allocate your buffer size using myfile.size, and use read on a FileInputStream to get it all in one go. It's not much harder, just don't forget to close it!)

Then if you don't want to convert it yourself, you can

val bb = java.nio.ByteBuffer.wrap(buffer)
bb.order(java.nio.ByteOrder.nativeOrder)
val shorts = new Array[Short](buffer.length/2)
bb.asShortBuffer.get(shorts)

And you're done.

Note that this is all Java stuff; there's nothing Scala-specific here save the syntax.

^{If you're wondering why this is so much faster than your code, it's because grouped(2) boxes the bytes and places them in an array. That's three allocations for every short you want! You can do it yourself by indexing the array directly, and that will be fast, but why would you want to when ByteBuffer and friends do exactly what you need already?}

^{If you really really care about that last (odd) byte, then you can use (buffer.length + 1)/2 for the size of shorts, and tack on a if ((buffer.length) & 1 == 1) shorts(shorts.length-1) = ((bb.get&0xFF) << 8).toShort to grab the last byte.}

Efficiently way to read binary files in scala

Answers (2)

Related Questions