Reputation: 3517
So I was wondering how I might use scalaz-stream to generate the digest of a file using java.security.MessageDigest?
I would like to do this using a constant memory buffer size (for example 4KB). I think I understand how to start with reading the file, but I am struggling to understand how to:
1) call digest.update(buf)
for each 4KB which effectively is a side-effect on the Java MessageDigest instance, which I guess should happen inside the scalaz-stream framework.
2) finally call digest.digest()
to receive back the calculated digest from within the scalaz-stream framework some how?
I think I understand kinda how to start:
import scalaz.stream._
import java.security.MessageDigest
val f = "/a/b/myfile.bin"
val bufSize = 4096
val digest = MessageDigest.getInstance("SHA-256")
Process.constant(bufSize).toSource
.through(io.fileChunkR(f, bufSize))
But then I am stuck!
Any hints please? I guess it must also be possible to wrap the creation, update, retrieval (of actual digest calculatuon) and destruction of digest object in a scalaz-stream Sink or something, and then call .to()
passing in that Sink? Sorry if I am using the wrong terminology, I am completely new to using scalaz-stream. I have been through a few of the examples but am still struggling.
Upvotes: 4
Views: 270
Reputation: 4775
Since version 0.4 scalaz-stream contains processes to calculate digests. They are available in the hash
module and use java.security.MessageDigest
under the hood. Here is a minimal example how you could use them:
import scalaz.concurrent.Task
import scalaz.stream._
object Sha1Sum extends App {
val fileName = "testdata/celsius.txt"
val bufferSize = 4096
val sha1sum: Task[Option[String]] =
Process.constant(bufferSize)
.toSource
.through(io.fileChunkR(fileName, bufferSize))
.pipe(hash.sha1)
.map(sum => s"${sum.toHex} $fileName")
.runLast
sha1sum.run.foreach(println)
}
The update()
and digest()
calls are all contained inside the hash.sha1
Process1
.
Upvotes: 3
Reputation: 3517
So I have something working, but it could probably be improved:
import java.io._
import java.security.MessageDigest
import resource._
import scodec.bits.ByteVector
import scalaz._, Scalaz._
import scalaz.concurrent.Task
import scalaz.stream._
import scalaz.stream.io._
val f = "/a/b/myfile.bin"
val bufSize = 4096
val md = MessageDigest.getInstance("SHA-256")
def _digestResource(md: => MessageDigest): Sink[Task,ByteVector] =
resource(Task.delay(md))(md => Task.delay(()))(
md => Task.now((bytes: ByteVector) => Task.delay(md.update(bytes.toArray))))
Process.constant(4096).toSource
.through(fileChunkR(f.getAbsolutePath, 4096))
.to(_digestResource(md))
.run
.run
md.digest()
However, it seems to me that there should be a cleaner way to do this, by moving the creation of the MessageDigest
inside the scalaz-stream stuff and have the final .run
yield the md.digest()
.
Better answers welcome...
Upvotes: 0