satyambansal117
satyambansal117

Reputation: 193

How to load json data in form of map in spark sql?

I have json data as shown

  "vScore": {
  "300x600": {
    "v1": "0.50",
    "v2": "0.67",
    "v3": "ATF",
    "v4": "H2",
    "v5": "0.11"
  },
  "728x90": {
    "v1": "0.48",
    "v2": "0.57",
    "v3": "Unknown",
    "v4": "H2",
    "v5": "0.51"
  },
  "300x250": {
    "v1": "0.64",
    "v2": "0.77",
    "v3": "ATF",
    "v4": "H2",
    "v5": "0.70"
  }, 

I want to load this json data in the form of map i.e. I want to load vScores in the map so that 300x250 becomes the key and the nested v1...v5 becomes the value of map. How to do it in spark sql in scala?

Upvotes: 1

Views: 3982

Answers (2)

Iraj Hedayati
Iraj Hedayati

Reputation: 1687

I was looking for a similar thing. So, here is a simple way to achieve it.

I'm using Spark Shell here.

Spark always considers a JSON object a struct unless you explicitly provide a schema. So, first, define a schema and read the file.

scala> import org.apache.spark.sql.types._

scala> val schema = StructType(Array(StructField("vScore", MapType(StringType, MapType(StringType, StringType)))))

scala> val df = spark.read.schema(schema).option("multiLine","true").json("/Users/iraj/Downloads/so.json")

scala> df.printSchema()
root
 |-- vScore: map (nullable = true)
 |    |-- key: string
 |    |-- value: map (valueContainsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

Then, I suggest to switch to Dataset by defining a case class. It's easier. You can still do the same thing using DataFrame and SQL expressions.

scala> import spark.implicits._

scala> case class MyDataModel(vScore: Map[String, Map[String, String]])

scala> val ds = df.as[MyDataModel]

Now transform it to what you want.

scala> val df2 = ds.map(row => row.vScore.map{ case (resolution, values) => resolution -> values.values.toList}).toDF("vScore")

scala> df2.printSchema
root
 |-- value: map (nullable = true)
 |    |-- key: string
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: string (containsNull = true)

 scala> df2.show(truncate = false)
+---------------------------------------------------------------------------------------------------------------------------+
|vScore                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------+
|{300x600 -> [0.67, 0.11, 0.50, H2, ATF], 728x90 -> [0.57, 0.51, 0.48, H2, Unknown], 300x250 -> [0.77, 0.70, 0.64, H2, ATF]}|
+---------------------------------------------------------------------------------------------------------------------------+

For the transformation part, I'm unsure if I understood your desired output correctly or not. So, let me know, and I can modify it.

Upvotes: 0

Soufian
Soufian

Reputation: 99

  1. You need to load your data using
data = sqlContext.read.json("file")
  1. you can check how your data was loaded
data.printSchema()
  1. get your data with "Select" query , using
data.select....

More: How to parse jsonfile with spark

Upvotes: 0

Related Questions