Reputation: 8936
My spark sql and scala codes:
var df = spark.sql(
s"""
|SELECT id, a, b, c, d
|FROM default.table
""".stripMargin)
var grouped_df = df.withColumn("map", struct("a", "b", "c", "d"))
the output of grouped_df
:
{
"id": 41286786,
"map": {
"a": "",
"b": "724",
"c": "7425",
"d": ""
}
}
how to get the following output or convert grouped_df
to:
{
"id": 41286786,
"array": [
{ "name": "b", "value": "724" },
{ "name": "c", "value": "7245" }
]
}
how to do it in spark sql or in UDF?
Upvotes: 1
Views: 1000
Reputation: 3344
Here is how you can do it using the DataFrame API in Scala (natively with no UDF):
import org.apache.spark.sql.functions.{array, struct, lit}
val result = grouped_df
.select(
$"id",
array(
struct(lit("b").alias("name"), $"map.b".alias("value")),
struct(lit("c").alias("name"), $"map.c".alias("value"))
).alias("array")
)
Upvotes: 2