Reputation: 1106
I have the following spark dataframe and I want to aggregate all the columns in one column as a JSON as follows: if the input dataframe is:
key,name,title
123,hsd,jds
148,sdf,qsz
589,qsz,aze
the expected result will be :
key,name,title,aggregation
123,hsd,jds,{"key":"123","name":"hsd", "title":"jds"}
148,sdf,qsz,{"key":"148","name":"sdf", "title":"qsz"}
589,qsz,aze,{"key":"589","name":"qsz", "title":"aze"}
The solution shall not hardcode the fields name, Any idea how to do this, please?
Upvotes: 0
Views: 736
Reputation: 23119
You can use to_json
function
val df = Seq(
(123, "hsd", "jds"),
(148, "sdf", "qsz"),
(589, "qsz", "aze")
).toDF("key", "name", "title")
import org.apache.spark.sql.functions._
df.withColumn("aggregation", to_json(struct($"key", $"name", $"title")))
.show(false)
You could use it below if you have many columns.
df.withColumn("aggregation", to_json(struct(df.columns.map(col): _*)))
Output:
+---+----+-----+--------------------------------------+
|key|name|title|aggregation |
+---+----+-----+--------------------------------------+
|123|hsd |jds |{"key":123,"name":"hsd","title":"jds"}|
|148|sdf |qsz |{"key":148,"name":"sdf","title":"qsz"}|
|589|qsz |aze |{"key":589,"name":"qsz","title":"aze"}|
+---+----+-----+--------------------------------------+
Upvotes: 2
Reputation: 4501
using to_json
but with more flexible columns:
Seq(
(123, "hsd", "jds"),
(148, "sdf", "qsz"),
(589, "qsz", "aze")
).toDF("key", "name", "title")
dfA.withColumn("aggregation", to_json(
map(dfA.columns.flatMap(columnName => Seq(lit(columnName), col(columnName))):_*))
).show(truncate = false)
+---+----+-----+----------------------------------------+
|key|name|title| aggregation |
+---+----+-----+----------------------------------------+
|123|hsd |jds |{"key":"123","name":"hsd","title":"jds"}|
|148|sdf |qsz |{"key":"148","name":"sdf","title":"qsz"}|
|589|qsz |aze |{"key":"589","name":"qsz","title":"aze"}|
+---+----+-----+----------------------------------------+
Upvotes: 0