Big data Hadoop dev.
Big data Hadoop dev.

Reputation: 169

scala read json and extract required column data

i am reading json multi line json files contain more than 60 fields require only 30 field as columns, how do i get required columns data from data frame.

scala> peopleDF.printSchema
root
 |-- Applications: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- b_als_o_isehp: boolean (nullable = true)
 |    |    |-- b_als_p_isehp: boolean (nullable = true)
 |    |    |-- b_als_s_isehp: boolean (nullable = true)
 |    |    |-- l_als_o_eventid: long (nullable = true)
 |    |    |-- l_als_o_pid: long (nullable = true)
 |    |    |-- l_als_o_sid: long (nullable = true)

how to get required columns only.(like l_als_o_pid, l_als_o_eventid,b_als_o_isehp).

 val peopleDF = spark.read.json("file:///root/users/inputjsondata/s_json2.json")
   var ss = peopleDF.select("Applications");
   ss.createOrReplaceTempView("result2")
   val child = ss.select(explode(peopleDF("Applications.t_als_s_path"))).toDF("app").show()

Upvotes: 1

Views: 1431

Answers (1)

koiralo
koiralo

Reputation: 23119

You can explode the first array field and select the inner fields as

val peopleDF = spark.read.json("file:///root/users/inputjsondata/s_json2.json")
val newDF = peopleDF.select(explode($"Applications").as("app"))
            .select("app.*")

Now you can select directly for the fields like l_als_o_pid, l_als_o_eventid,b_als_o_isehp Hope this helps!

Upvotes: 3

Related Questions