Reputation: 13
I'm struggling with an issue of exploding a json array column with Spark.
I have a dataframe that looks like:
+------+------------------------------------------------------------------------+
|id |struct |
+------+------------------------------------------------------------------------+
| 1 | [{_name: BankAccount, _value: 123456}, {_name: Balance, _value: 500$}]|
| 2 | [{_name: BankAccount, _value: 098765}, {_name: Balance, _value: 100$}]|
| 3 | [{_name: BankAccount, _value: 135790}, {_name: Balance, _value: 200$}]|
+------+------------------------------------------------------------------------+
and I want it to be as
+------+------------+--------+
|id | BankAccount| Balance|
+------+------------+--------+
| 1 | 123456 | 500$ |
| 2 | 098765 | 100$ |
| 3 | 135790 | 200$ |
+------+------------+--------+
Of course it is not really exploding but I couldn't be remotely close to the outcome of what I need.
Thanks for your help!
Upvotes: 0
Views: 171
Reputation: 10372
Check below code.
From your sample data instead of struct
I have used data
for simplicity.. :)
val df = Seq((1,"""[{"_name":"BankAccount","_value":"123456"},{"_name":"Balance","_value": "500$"}]"""),(2,"""[{"_name":"BankAccount","_value":"098765"},{"_name":"Balance","_value": "100$"}]"""),(3,"""[{"_name":"BankAccount","_value":"135790"},{"_name":"Balance","_value": "200$"}]""")).toDF("id","data")
Print Schema of data
scala> df.printSchema
root
|-- id: integer (nullable = false)
|-- data: string (nullable = true)
Showing Sample Data
scala> df.show(false)
+---+--------------------------------------------------------------------------------+
|id |data |
+---+--------------------------------------------------------------------------------+
|1 |[{"_name":"BankAccount","_value":"123456"},{"_name":"Balance","_value": "500$"}]|
|2 |[{"_name":"BankAccount","_value":"098765"},{"_name":"Balance","_value": "100$"}]|
|3 |[{"_name":"BankAccount","_value":"135790"},{"_name":"Balance","_value": "200$"}]|
+---+--------------------------------------------------------------------------------+
Creating schema for json data
scala> val schema = ArrayType(MapType(StringType,StringType))
Using explode
, groupBy
& pivot
to get expected result.
Note - You may need to tune below code little bit as per your requirement.
scala>
df
.withColumn("data",explode(from_json($"data",schema)))
.select($"id",struct($"data"("_name").as("key"),$"data"("_value").as("value")).as("data"))
.select($"id",$"data.*")
.groupBy($"id")
.pivot($"key")
.agg(first($"value"))
.select("id","BankAccount","Balance")
.orderBy($"id".asc)
.show(false)
Final Result
+---+-----------+-------+
|id |BankAccount|Balance|
+---+-----------+-------+
|1 |123456 |500$ |
|2 |098765 |100$ |
|3 |135790 |200$ |
+---+-----------+-------+
Upvotes: 2