Convert a Spark's Data-frame's Json column to Array of Object

Question

I have a dataframe with JSON column. JSON basically contains array of key and value as in below example.

Col1
=====================================================================
|{“Name”:”Ram”,”Place”:”RamGarh”}                                    |
|{“Name”:”Lakshman”,”Place”:”LakshManPur”.”DepartMent”:”Operations”} |
|{“Name”:”Sita”,”Place”:”SitaPur”,”Experience”,”14”}                 |

I need to parse this JSON data. What should be most efficient way?

I need to present it form of

case class dfCol(col:String, valu:String)

So basically I need to parse json of every row of that dataframe and convert in form

 |   Col
 |   ==========================================================
 |   Array(dfCol(Name,Ram),dfCOl(Place,Ramgarh))
 |   Array(dfCol(Name,Lakshman),dfCOl(Place,LakshManPur),dfCOl(DepartMent,Operations))
 |   Array(dfCol(Name,Sita),dfCOl(Place,SitaPur),dfCOl(Experience,14))

Som · Accepted Answer

Use this -

case class dfCol(col:String, valu:String)

Load the test data provided

val data =
      """
        |{"Name":"Ram","Place":"RamGarh"}
        |{"Name":"Lakshman","Place":"LakshManPur","DepartMent":"Operations"}
        |{"Name":"Sita","Place":"SitaPur","Experience":14.0}
      """.stripMargin
    val df = spark.read.json(data.split(System.lineSeparator()).toSeq.toDS())
    df.show(false)
    df.printSchema()
    /**
      * +----------+----------+--------+-----------+
      * |DepartMent|Experience|Name    |Place      |
      * +----------+----------+--------+-----------+
      * |null      |null      |Ram     |RamGarh    |
      * |Operations|null      |Lakshman|LakshManPur|
      * |null      |14.0      |Sita    |SitaPur    |
      * +----------+----------+--------+-----------+
      *
      * root
      * |-- DepartMent: string (nullable = true)
      * |-- Experience: double (nullable = true)
      * |-- Name: string (nullable = true)
      * |-- Place: string (nullable = true)
      */

Convert `Row -> Array[dfCol]`

   val ds: Dataset[Array[dfCol]] = df.map(row => {
      row.getValuesMap[String](row.schema.map(_.name))
        .filter(_._2 != null)
        .map{f => dfCol(f._1, String.valueOf(f._2))}
        .toArray
    })
    ds.show(false)
    ds.printSchema()

    // +------------------------------------------------------------------+
    //|value                                                             |
    //+------------------------------------------------------------------+
    //|[[Name, Ram], [Place, RamGarh]]                                   |
    //|[[DepartMent, Operations], [Name, Lakshman], [Place, LakshManPur]]|
    //|[[Experience, 14.0], [Name, Sita], [Place, SitaPur]]              |
    //+------------------------------------------------------------------+
    //
    //root
    // |-- value: array (nullable = true)
    // |    |-- element: struct (containsNull = true)
    // |    |    |-- col: string (nullable = true)
    // |    |    |-- valu: string (nullable = true)

Convert a Spark's Data-frame's Json column to Array of Object

Answers (2)

Load the test data provided

Convert `Row -> Array[dfCol]`

Related Questions

Convert a Spark&#39;s Data-frame&#39;s Json column to Array of Object

Answers (2)

Load the test data provided

Convert Row -> Array[dfCol]

Related Questions

Convert a Spark's Data-frame's Json column to Array of Object

Convert `Row -> Array[dfCol]`