Reputation: 20390
How can I take a Spark DataFrame structured like this:
val sourcedf = spark.createDataFrame(
List(
Row(Map("AL" -> "Alabama", "AK" -> "Alaska").asJava),
Row(Map("TX" -> "Texas", "FL" -> "Florida", "NJ" -> "New Jersey").asJava)
).asJava, StructType(
StructField("my_map", MapType(StringType, StringType, false)) ::
Nil))
or in a text form, sourcedf.show(false)
shows:
+----------------------------------------------+
|my_map |
+----------------------------------------------+
|[AL -> Alabama, AK -> Alaska] |
|[TX -> Texas, FL -> Florida, NJ -> New Jersey]|
+----------------------------------------------+
and programmatically transform to this structure:
val targetdf = spark.createDataFrame(
List(
Row(List(Map("Key" -> "AL", "Value" -> "Alabama"), Map("Key" -> "AK", "Value" -> "Alaska")).asJava),
Row(List(Map("Key" -> "TX", "Value" -> "Texas"), Map("Key" -> "FL", "Value" -> "Florida"), Map("Key" -> "NJ", "Value" -> "New Jersey")).asJava)
).asJava, StructType(
StructField("my_list", ArrayType(MapType(StringType, StringType, false), false)) ::
Nil))
or in a text form, targetdf.show(false)
shows:
+----------------------------------------------------------------------------------------------+
|my_list |
+----------------------------------------------------------------------------------------------+
|[[Key -> AL, Value -> Alabama], [Key -> AK, Value -> Alaska]] |
|[[Key -> TX, Value -> Texas], [Key -> FL, Value -> Florida], [Key -> NJ, Value -> New Jersey]]|
+----------------------------------------------------------------------------------------------+```
Upvotes: 2
Views: 2116
Reputation: 188
So whilst using Scala, I couldn't figure out how to handle a java.util.Map
with provided Encoders, I probably would have had to write one myself and I figured it was too much work.
However, I can see two ways to do this without converting to java.util.Map
and using scala.collection.immutable.Map
.
You could convert into a Dataset[Obj]
and flatMap.
case class Foo(my_map: Map[String, String])
case class Bar(my_list: List[Map[String, String]])
implicit val encoder = ExpressionEncoder[List[Map[String, String]]]
val ds: Dataset[Foo] = sourcedf.as[Foo]
val output: Dataset[Bar] = ds.map(x => Bar(x.my_map.flatMap({case (k, v) => List(Map("key" -> k, "value" -> v))}).toList))
output.show(false)
Or you can use a UDF
val mapToList: Map[String, String] => List[Map[String, String]] = {
x => x.flatMap({case (k, v) => List(Map("key" -> k, "value" -> v))}).toList
}
val mapToListUdf: UserDefinedFunction = udf(mapToList)
val output: Dataset[Row] = sourcedf.select(mapToListUdf($"my_map").as("my_list"))
output.show(false)
Both output
+----------------------------------------------------------------------------------------------+
|my_list |
+----------------------------------------------------------------------------------------------+
|[[key -> AL, value -> Alabama], [key -> AK, value -> Alaska]] |
|[[key -> TX, value -> Texas], [key -> FL, value -> Florida], [key -> NJ, value -> New Jersey]]|
+----------------------------------------------------------------------------------------------+
Upvotes: 1