Reputation: 160
I have several columns to be extracted from json string. However one field has nested values. Not sure how to deal with that?
Need to explode into multiple rows to get values of field name, Value1, Value2.
import spark.implicits._
val df = Seq(
("1", """{"k": "foo", "v": 1.0}""", "some_other_field_1"),
("2", """{"p": "bar", "q": 3.0}""", "some_other_field_2"),
("3",
"""{"nestedKey":[ {"field name":"name1","Value1":false,"Value2":true},
| {"field name":"name2","Value1":"100","Value2":"200"}
|]}""".stripMargin, "some_other_field_3")
).toDF("id","json","other")
df.show(truncate = false)
val df1= df.withColumn("id1",col("id"))
.withColumn("other1",col("other"))
.withColumn("k",get_json_object(col("json"),"$.k"))
.withColumn("v",get_json_object(col("json"),"$.v"))
.withColumn("p",get_json_object(col("json"),"$.p"))
.withColumn("q",get_json_object(col("json"),"$.q"))
.withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
.select("id1","other1","k","v","p","q","nestedKey")
df1.show(truncate = false)
Upvotes: 0
Views: 749
Reputation: 160
i did it in one dataframe
val df1= df.withColumn("id1",col("id"))
.withColumn("other1",col("other"))
.withColumn("k",get_json_object(col("json"),"$.k"))
.withColumn("v",get_json_object(col("json"),"$.v"))
.withColumn("p",get_json_object(col("json"),"$.p"))
.withColumn("q",get_json_object(col("json"),"$.q"))
.withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
.withColumn(
"nestedKey",
expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
).withColumn("fieldname",col("nestedKey.field name"))
.withColumn("valueone",col("nestedKey.Value1"))
.withColumn("valuetwo",col("nestedKey.Value2"))
.select("id1","other1","k","v","p","q","fieldname","valueone","valuetwo")```
still working to make it more elegant
Upvotes: 0
Reputation: 42332
You can parse the nestedKey using from_json
and explode it:
val df2 = df1.withColumn(
"nestedKey",
expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
).select("*", "nestedKey.*").drop("nestedKey")
df2.show
+---+------------------+----+----+----+----+----------+------+------+
|id1| other1| k| v| p| q|field name|Value1|Value2|
+---+------------------+----+----+----+----+----------+------+------+
| 1|some_other_field_1| foo| 1.0|null|null| null| null| null|
| 2|some_other_field_2|null|null| bar| 3.0| null| null| null|
| 3|some_other_field_3|null|null|null|null| name1| false| true|
| 3|some_other_field_3|null|null|null|null| name2| 100| 200|
+---+------------------+----+----+----+----+----------+------+------+
Upvotes: 2