enphyr
enphyr

Reputation: 23

How to read json file and convert to case class with Spark and Spray Json

I have a text file containing json lines whose structure is as shown below.

{"city": "London","street": null, "place": "Pizzaria", "foo": "Bar"}

I need to read it in as JSON with spark and transform it into a case class with the below scala code. I only need defined fields given in case class from json.

 import org.apache.spark.sql.SparkSession
 import spray.json.DefaultJsonProtocol
 import spray.json._


object SimpleExample extends DefaultJsonProtocol {

  case class Row(city: String,
                 street: Option[String],
                 place: String)

  implicit val rowFormat = jsonFormat3(Row)

  def main(args: Array[String]): Unit = {

    val logFile = "example.txt"
    val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
    val logData = spark.read.textFile(logFile).cache()

    import spark.implicits._

    val parsed = logData.map(line => line.parseJson.convertTo[Row])

    println(s"Total Count : ${parsed.count()}")

    spark.stop()
  }
}

However when i run my spark application, I get the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: spray/json/JsonFormat
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:42)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: spray.json.JsonFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

I am guessing the mistake is about imports but could not solve it.

Upvotes: 2

Views: 4327

Answers (1)

Shaido
Shaido

Reputation: 28392

You can read the data directly as json (without spray-json) and then convert it into a dataset.

import spark.implicits._

val logData = spark.read.json(logFile)
logData.select("city", "street", "place").as[Row]

As long as the variable names in the case class matches those in the file this will work without problems.

Upvotes: 3

Related Questions