Learner
Learner

Reputation: 1695

Suggestions for Writing Map as JSON file in Scala

I have a simple single key-valued Map(K,V) myDictionary that is populated by my program and at the end I want to write it as JSON format string in a text file - as I would need parse them later.

I was using this code earlier,

Some(new PrintWriter(outputDir+"/myDictionary.json")).foreach{p => p.write(compact(render(decompose(myDictionary)))); p.close}

I found it to be slower as the input size increased. Later, I used this var out = new

var out = new PrintWriter(outputDir+"/myDictionary.json");
out.println(scala.util.parsing.json.JSONObject(myDictionary.toMap).toString())

This is proving to be bit faster.

I have run this for sample input and found that this is faster than my earlier approach. I assuming my input map size would reach at least a million values( >1GB text file) (K,V) hence I want to make sure that I follow the faster and memory efficient approach for Map serialization process.What are other approaches that you would recommend,that I can look into to optimize this.

Upvotes: 0

Views: 2546

Answers (1)

0__
0__

Reputation: 67280

The JSON support in the standard Scala library is probably not the best choice. Unfortunately the situation with JSON libraries for Scala is a bit confusing, there are many alternatives (Lift JSON, Play JSON, Spray JSON, Twitter JSON, Argonaut, ...), basically one library for each day of the week... I suggest you have a look at these at least to see if any of them is easier to use and more performative.


Here is an example using Play JSON which I have chosen for particular reasons (being able to generate formats with macros):

object JsonTest extends App {
  import play.api.libs.json._

  type MyDict = Map[String, Int]

  implicit object MyDictFormat extends Format[MyDict] {
    def reads(json: JsValue): JsResult[MyDict] = json match {
      case JsObject(fields) =>
        val b = Map.newBuilder[String, Int]
        fields.foreach {
          case (k, JsNumber(v)) => b += k -> v.toInt
          case other => return JsError(s"Not a (string, number) pair: $other")
        }
        JsSuccess(b.result())

      case _ => JsError(s"Not an object: $json")
    }

    def writes(m: MyDict): JsValue = {
      val fields: Seq[(String, JsValue)] = m.map {
        case (k, v) => k -> JsNumber(v)
      } (collection.breakOut)

      JsObject(fields)
    }
  }

  val m       = Map("hallo" -> 12, "gallo" -> 34)
  val serial  = Json.toJson(m)
  val text    = Json.stringify(serial)
  println(text)
  val back    = Json.fromJson[MyDict](serial)
  assert(back == JsSuccess(m), s"Failed: $back")
}

While you can construct and deconstruct JsValues directly, the main idea is to use a Format[A] where A is the type of your data structure. This puts more emphasis on type safety than the standard Scala-Library JSON. It looks more verbose, but in end I think it's the better approach.

There are utility methods Json.toJson and Json.fromJson which look for an implicit format of the type you want.

On the other hand, it does construct everything in-memory and it does duplicate your data structure (because for each entry in your map you will have another tuple (String, JsValue)), so this isn't necessarily the most memory efficient solution, given that you are operating in the GB magnitude...


Jerkson is a Scala wrapper for the Java JSON library Jackson. The latter apparently has the feature to stream data. I found this project which says it adds streaming support. Play JSON in turn is based on Jerkson, so perhaps you can even figure out how to stream your object with that. See also this question.

Upvotes: 4

Related Questions