clay
clay

Reputation: 20390

Convert Spark DataFrame Map into Array of Maps of `{"Key": key, "Value": value}`

How can I take a Spark DataFrame structured like this:

val sourcedf = spark.createDataFrame(
  List(
    Row(Map("AL" -> "Alabama", "AK" -> "Alaska").asJava),
    Row(Map("TX" -> "Texas", "FL" -> "Florida", "NJ" -> "New Jersey").asJava)
  ).asJava, StructType(
    StructField("my_map", MapType(StringType, StringType, false)) ::
    Nil))

or in a text form, sourcedf.show(false) shows:

+----------------------------------------------+
|my_map                                        |
+----------------------------------------------+
|[AL -> Alabama, AK -> Alaska]                 |
|[TX -> Texas, FL -> Florida, NJ -> New Jersey]|
+----------------------------------------------+

and programmatically transform to this structure:

val targetdf = spark.createDataFrame(
  List(
    Row(List(Map("Key" -> "AL", "Value" -> "Alabama"), Map("Key" -> "AK", "Value" -> "Alaska")).asJava),
    Row(List(Map("Key" -> "TX", "Value" -> "Texas"), Map("Key" -> "FL", "Value" -> "Florida"), Map("Key" -> "NJ", "Value" -> "New Jersey")).asJava)
  ).asJava, StructType(
    StructField("my_list", ArrayType(MapType(StringType, StringType, false), false)) ::
    Nil))

or in a text form, targetdf.show(false) shows:

+----------------------------------------------------------------------------------------------+
|my_list                                                                                       |
+----------------------------------------------------------------------------------------------+
|[[Key -> AL, Value -> Alabama], [Key -> AK, Value -> Alaska]]                                 |
|[[Key -> TX, Value -> Texas], [Key -> FL, Value -> Florida], [Key -> NJ, Value -> New Jersey]]|
+----------------------------------------------------------------------------------------------+```

Upvotes: 2

Views: 2116

Answers (1)

AJY
AJY

Reputation: 188

So whilst using Scala, I couldn't figure out how to handle a java.util.Map with provided Encoders, I probably would have had to write one myself and I figured it was too much work.

However, I can see two ways to do this without converting to java.util.Map and using scala.collection.immutable.Map.

You could convert into a Dataset[Obj] and flatMap.

case class Foo(my_map: Map[String, String])

case class Bar(my_list: List[Map[String, String]])

implicit val encoder = ExpressionEncoder[List[Map[String, String]]]

val ds: Dataset[Foo] = sourcedf.as[Foo]
val output: Dataset[Bar] = ds.map(x => Bar(x.my_map.flatMap({case (k, v) => List(Map("key" -> k, "value" -> v))}).toList))
output.show(false)

Or you can use a UDF

val mapToList: Map[String, String] => List[Map[String, String]] = {
  x => x.flatMap({case (k, v) => List(Map("key" -> k, "value" -> v))}).toList
}
val mapToListUdf: UserDefinedFunction = udf(mapToList)
val output: Dataset[Row] = sourcedf.select(mapToListUdf($"my_map").as("my_list"))
output.show(false)

Both output

+----------------------------------------------------------------------------------------------+
|my_list                                                                                        |
+----------------------------------------------------------------------------------------------+
|[[key -> AL, value -> Alabama], [key -> AK, value -> Alaska]]                                 |
|[[key -> TX, value -> Texas], [key -> FL, value -> Florida], [key -> NJ, value -> New Jersey]]|
+----------------------------------------------------------------------------------------------+

Upvotes: 1

Related Questions