Reputation: 1106

aggregate columns in a spark dataframe as a json

I have the following spark dataframe and I want to aggregate all the columns in one column as a JSON as follows: if the input dataframe is:

key,name,title
123,hsd,jds
148,sdf,qsz
589,qsz,aze

the expected result will be :

key,name,title,aggregation
123,hsd,jds,{"key":"123","name":"hsd", "title":"jds"}
148,sdf,qsz,{"key":"148","name":"sdf", "title":"qsz"}
589,qsz,aze,{"key":"589","name":"qsz", "title":"aze"}

The solution shall not hardcode the fields name, Any idea how to do this, please?

Upvotes: 0

Answers (2)

koiralo

Reputation: 23119

You can use to_json function

val df = Seq(
  (123, "hsd", "jds"),
  (148, "sdf", "qsz"),
  (589, "qsz", "aze")
).toDF("key", "name", "title")

import org.apache.spark.sql.functions._
df.withColumn("aggregation", to_json(struct($"key", $"name", $"title")))
  .show(false)

You could use it below if you have many columns.

df.withColumn("aggregation", to_json(struct(df.columns.map(col): _*)))

Output:

+---+----+-----+--------------------------------------+
|key|name|title|aggregation                           |
+---+----+-----+--------------------------------------+
|123|hsd |jds  |{"key":123,"name":"hsd","title":"jds"}|
|148|sdf |qsz  |{"key":148,"name":"sdf","title":"qsz"}|
|589|qsz |aze  |{"key":589,"name":"qsz","title":"aze"}|
+---+----+-----+--------------------------------------+

Upvotes: 2

Boris Azanov

Reputation: 4501

using to_json but with more flexible columns:

Seq(
    (123, "hsd", "jds"),
    (148, "sdf", "qsz"),
    (589, "qsz", "aze")
).toDF("key", "name", "title")
dfA.withColumn("aggregation", to_json(
  map(dfA.columns.flatMap(columnName => Seq(lit(columnName), col(columnName))):_*))
).show(truncate = false)


+---+----+-----+----------------------------------------+
|key|name|title| aggregation                              |
+---+----+-----+----------------------------------------+
|123|hsd |jds  |{"key":"123","name":"hsd","title":"jds"}|
|148|sdf |qsz  |{"key":"148","name":"sdf","title":"qsz"}|
|589|qsz |aze  |{"key":"589","name":"qsz","title":"aze"}|
+---+----+-----+----------------------------------------+

Upvotes: 0

aggregate columns in a spark dataframe as a json

Answers (2)

Related Questions