adamretter
adamretter

Reputation: 3517

Using scalaz-stream to calculate a digest

So I was wondering how I might use scalaz-stream to generate the digest of a file using java.security.MessageDigest?

I would like to do this using a constant memory buffer size (for example 4KB). I think I understand how to start with reading the file, but I am struggling to understand how to:

1) call digest.update(buf) for each 4KB which effectively is a side-effect on the Java MessageDigest instance, which I guess should happen inside the scalaz-stream framework.

2) finally call digest.digest() to receive back the calculated digest from within the scalaz-stream framework some how?

I think I understand kinda how to start:

import scalaz.stream._
import java.security.MessageDigest

val f = "/a/b/myfile.bin"
val bufSize = 4096

val digest = MessageDigest.getInstance("SHA-256")

Process.constant(bufSize).toSource
  .through(io.fileChunkR(f, bufSize))

But then I am stuck! Any hints please? I guess it must also be possible to wrap the creation, update, retrieval (of actual digest calculatuon) and destruction of digest object in a scalaz-stream Sink or something, and then call .to() passing in that Sink? Sorry if I am using the wrong terminology, I am completely new to using scalaz-stream. I have been through a few of the examples but am still struggling.

Upvotes: 4

Views: 270

Answers (2)

Frank S. Thomas
Frank S. Thomas

Reputation: 4775

Since version 0.4 scalaz-stream contains processes to calculate digests. They are available in the hash module and use java.security.MessageDigest under the hood. Here is a minimal example how you could use them:

import scalaz.concurrent.Task
import scalaz.stream._

object Sha1Sum extends App {
  val fileName = "testdata/celsius.txt"
  val bufferSize = 4096

  val sha1sum: Task[Option[String]] =
    Process.constant(bufferSize)
      .toSource
      .through(io.fileChunkR(fileName, bufferSize))
      .pipe(hash.sha1)
      .map(sum => s"${sum.toHex}  $fileName")
      .runLast

  sha1sum.run.foreach(println)
}

The update() and digest() calls are all contained inside the hash.sha1 Process1.

Upvotes: 3

adamretter
adamretter

Reputation: 3517

So I have something working, but it could probably be improved:

import java.io._
import java.security.MessageDigest
import resource._
import scodec.bits.ByteVector
import scalaz._, Scalaz._
import scalaz.concurrent.Task
import scalaz.stream._
import scalaz.stream.io._

val f = "/a/b/myfile.bin"
val bufSize = 4096

val md = MessageDigest.getInstance("SHA-256")

def _digestResource(md: => MessageDigest): Sink[Task,ByteVector] =
      resource(Task.delay(md))(md => Task.delay(()))(
        md => Task.now((bytes: ByteVector) => Task.delay(md.update(bytes.toArray))))

Process.constant(4096).toSource
    .through(fileChunkR(f.getAbsolutePath, 4096))
    .to(_digestResource(md))
    .run
    .run

md.digest()

However, it seems to me that there should be a cleaner way to do this, by moving the creation of the MessageDigest inside the scalaz-stream stuff and have the final .run yield the md.digest().

Better answers welcome...

Upvotes: 0

Related Questions