Reputation: 21
I have a spark data frame as given below:
id | col1 | col2 |
---|---|---|
1 | [{"a":1}] | [{"d": 3, "e": 4}] |
2 | [{"a":2}] | [{"d": 5, "e": 10}] |
I want to obtain the following data frame:
id | col2_sum |
---|---|
1 | 7 |
2 | 10 |
Datatypes:
id:StringType
col1:StringType
col2:StringType
Thanks in advance
Upvotes: 0
Views: 398
Reputation: 32710
Convert JSON string into map type using from_json
then use aggregate
function to sum the map values:
val df = Seq(
(1, """[{"a":1}]""", """[{"d": 3, "e": 4}]"""),
(2, """[{"a":2}]""", """[{"d": 5, "e": 10}]""")
).toDF("id", "col1", "col2")
val df1 = (df
.withColumn("col2", from_json(col("col2"), lit("array<map<string,int>>")))
.withColumn("col2", flatten(expr("transform(col2, x -> map_values(x))")))
.withColumn("col2_sum", expr("aggregate(col2, 0, (acc, x) -> acc + x)"))
.drop("col1", "col2")
)
df1.show
//+---+--------+
//| id|col2_sum|
//+---+--------+
//| 1| 7|
//| 2| 15|
//+---+--------+
Upvotes: 1