BAE
BAE

Reputation: 8936

spark: convert struct/dictionary to array of structs/dictionaries

My spark sql and scala codes:

var df = spark.sql(
     s"""
             |SELECT id, a, b, c, d
             |FROM default.table
      """.stripMargin)

var grouped_df = df.withColumn("map", struct("a", "b", "c", "d"))

the output of grouped_df:

{
  "id": 41286786,
  "map": {
    "a": "",
    "b": "724",
    "c": "7425",
    "d": ""
  }
 }

how to get the following output or convert grouped_df to:

{
  "id": 41286786,
  "array": [
    { "name": "b", "value": "724" },
    { "name": "c", "value": "7245" }
  ]
 }

how to do it in spark sql or in UDF?

Upvotes: 1

Views: 1000

Answers (1)

David Vrba
David Vrba

Reputation: 3344

Here is how you can do it using the DataFrame API in Scala (natively with no UDF):

import org.apache.spark.sql.functions.{array, struct, lit}

val result = grouped_df
  .select(
    $"id",
    array(
      struct(lit("b").alias("name"), $"map.b".alias("value")),
      struct(lit("c").alias("name"), $"map.c".alias("value"))
    ).alias("array")
  )

Upvotes: 2

Related Questions