CoolGoose
CoolGoose

Reputation: 160

Spark dataframe from Json string with nested key

I have several columns to be extracted from json string. However one field has nested values. Not sure how to deal with that?

Need to explode into multiple rows to get values of field name, Value1, Value2.

import spark.implicits._

val df = Seq(
  ("1", """{"k": "foo", "v": 1.0}""", "some_other_field_1"),
  ("2", """{"p": "bar", "q": 3.0}""", "some_other_field_2"),
  ("3",
    """{"nestedKey":[ {"field name":"name1","Value1":false,"Value2":true},
      |                 {"field name":"name2","Value1":"100","Value2":"200"}
      |]}""".stripMargin, "some_other_field_3")

).toDF("id","json","other")

df.show(truncate = false)
val df1= df.withColumn("id1",col("id"))
  .withColumn("other1",col("other"))
  .withColumn("k",get_json_object(col("json"),"$.k"))
  .withColumn("v",get_json_object(col("json"),"$.v"))
  .withColumn("p",get_json_object(col("json"),"$.p"))
  .withColumn("q",get_json_object(col("json"),"$.q"))
  .withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
    .select("id1","other1","k","v","p","q","nestedKey")
df1.show(truncate = false)

Upvotes: 0

Views: 749

Answers (2)

CoolGoose
CoolGoose

Reputation: 160

i did it in one dataframe

 val df1= df.withColumn("id1",col("id"))
    .withColumn("other1",col("other"))
    .withColumn("k",get_json_object(col("json"),"$.k"))
    .withColumn("v",get_json_object(col("json"),"$.v"))
    .withColumn("p",get_json_object(col("json"),"$.p"))
    .withColumn("q",get_json_object(col("json"),"$.q"))
    .withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
  .withColumn(
    "nestedKey",
    expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
  ).withColumn("fieldname",col("nestedKey.field name"))
    .withColumn("valueone",col("nestedKey.Value1"))
    .withColumn("valuetwo",col("nestedKey.Value2"))
   .select("id1","other1","k","v","p","q","fieldname","valueone","valuetwo")```


still working to make it more elegant

Upvotes: 0

mck
mck

Reputation: 42332

You can parse the nestedKey using from_json and explode it:

val df2 = df1.withColumn(
    "nestedKey", 
    expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
).select("*", "nestedKey.*").drop("nestedKey")

df2.show
+---+------------------+----+----+----+----+----------+------+------+
|id1|            other1|   k|   v|   p|   q|field name|Value1|Value2|
+---+------------------+----+----+----+----+----------+------+------+
|  1|some_other_field_1| foo| 1.0|null|null|      null|  null|  null|
|  2|some_other_field_2|null|null| bar| 3.0|      null|  null|  null|
|  3|some_other_field_3|null|null|null|null|     name1| false|  true|
|  3|some_other_field_3|null|null|null|null|     name2|   100|   200|
+---+------------------+----+----+----+----+----------+------+------+

Upvotes: 2

Related Questions