Mahindar Boregam
Mahindar Boregam

Reputation: 855

Spark Read json object data as MapType

I have written a sample spark app, where I'm creating a dataframe with MapType and writing it to disk. Then I'm reading the same file & printing its schema. Bu the output file schema is different when compared to Input Schema and I don't see the MapType in the Output. How I can read that output file with MapType

Code

import org.apache.spark.sql.{SaveMode, SparkSession}

case class Department(Id:String,Description:String)
case class Person(name:String,department:Map[String,Department])

object sample {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder.master("local").appName("Custom Poc").getOrCreate
    import spark.implicits._

    val schemaData = Seq(
      Person("Persion1", Map("It" -> Department("1", "It Department"), "HR" -> Department("2", "HR Department"))),
      Person("Persion2", Map("It" -> Department("1", "It Department")))
    )
    val df = spark.sparkContext.parallelize(schemaData).toDF()
    println("Input schema")
    df.printSchema()
    df.write.mode(SaveMode.Overwrite).json("D:\\save\\output")

    println("Output schema")
    spark.read.json("D:\\save\\output\\*.json").printSchema()
  }
}

OutPut

Input schema
root
 |-- name: string (nullable = true)
 |-- department: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- Id: string (nullable = true)
 |    |    |-- Description: string (nullable = true)
Output schema
root
 |-- department: struct (nullable = true)
 |    |-- HR: struct (nullable = true)
 |    |    |-- Description: string (nullable = true)
 |    |    |-- Id: string (nullable = true)
 |    |-- It: struct (nullable = true)
 |    |    |-- Description: string (nullable = true)
 |    |    |-- Id: string (nullable = true)
 |-- name: string (nullable = true)

Json File

{"name":"Persion1","department":{"It":{"Id":"1","Description":"It Department"},"HR":{"Id":"2","Description":"HR Department"}}}
{"name":"Persion2","department":{"It":{"Id":"1","Description":"It Department"}}}

EDIT : For just explaining my requirement I have added the saving file part above. In actual scenario I will be just reading JSON data provided above and work on that dataframe

Upvotes: 1

Views: 4552

Answers (1)

koiralo
koiralo

Reputation: 23109

You can pass the schema from prevous dataframe while reading the json data

println("Input schema")
df.printSchema()
df.write.mode(SaveMode.Overwrite).json("D:\\save\\output")

println("Output schema")
spark.read.schema(df.schema).json("D:\\save\\output")

Input schema

root
 |-- name: string (nullable = true)
 |-- department: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- Id: string (nullable = true)
 |    |    |-- Description: string (nullable = true)

Output schema

root
 |-- name: string (nullable = true)
 |-- department: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- Id: string (nullable = true)
 |    |    |-- Description: string (nullable = true)

Hope this helps!

Upvotes: 3

Related Questions