Reputation: 331
I have the following spark dataframe and its corresponding schema
+----+--------------------+
|name| subject_list|
+----+--------------------+
| Tom|[[Math -> 99], [P...|
| Amy| [[Physics -> 77]]|
+----+--------------------+
root
|-- name: string (nullable = true)
|-- subject_list: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: integer (valueContainsNull = false)
How can I dump this dataframe into a csv file seperated by "\t" as following
Tom [(Math, 99), (Physics, 88)]
Amy [(Physics, 77)]
Here's link to a similar post to this question but it is for dumping an array of string, not an array of map.
Appreciate for any help, thanks.
Upvotes: 2
Views: 1133
Reputation: 23109
You can write an udf
to convert Map
to string
as you want like
val mapToString = udf((marks: Map[String, String]) => {
marks.map{case (k, v) => (s"(${k},${v})")}.mkString("[",",", "]")
})
dff.withColumn("marks", mapToString($"marks"))
.write.option("delimiter", "\t")
.csv("csvoutput")
Output:
Tom [(Math,99),(Physics,88)]
Amy [(Physics,77)]
But I don't recommend you to do this, You gonna have problem while reading again and have to parse manually Its better to flatten those map as
dff.select($"name", explode($"marks")).write.csv("csvNewoutput")
Which will store as
Tom,Math,99
Tom,Physics,88
Amy,Physics,77
Upvotes: 1
Reputation: 3590
The reason why it throws error and other details are listed in same link that you have shared. Here is the modified version of stringify
for array of map:
def stringify = udf((vs: Seq[Map[String, Int]]) => vs match {
case null => null
case x => "[" + x.flatMap(_.toList).mkString(",") + "]"
})
credits: link
Upvotes: 2