Bagel912
Bagel912

Reputation: 331

Dump array of map column of a spark dataframe into csv file

I have the following spark dataframe and its corresponding schema

+----+--------------------+
|name|        subject_list|
+----+--------------------+
| Tom|[[Math -> 99], [P...|
| Amy|   [[Physics -> 77]]|
+----+--------------------+

root
 |-- name: string (nullable = true)
 |-- subject_list: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: integer (valueContainsNull = false)

How can I dump this dataframe into a csv file seperated by "\t" as following

Tom    [(Math, 99), (Physics, 88)]
Amy    [(Physics, 77)]

Here's link to a similar post to this question but it is for dumping an array of string, not an array of map.

Appreciate for any help, thanks.

Upvotes: 2

Views: 1133

Answers (2)

koiralo
koiralo

Reputation: 23109

You can write an udf to convert Map to string as you want like

val mapToString = udf((marks: Map[String, String]) => {
  marks.map{case (k, v) => (s"(${k},${v})")}.mkString("[",",", "]")
})

dff.withColumn("marks", mapToString($"marks"))
  .write.option("delimiter", "\t")
  .csv("csvoutput")

Output:

Tom [(Math,99),(Physics,88)]
Amy [(Physics,77)]

But I don't recommend you to do this, You gonna have problem while reading again and have to parse manually Its better to flatten those map as

dff.select($"name", explode($"marks")).write.csv("csvNewoutput")

Which will store as

Tom,Math,99
Tom,Physics,88
Amy,Physics,77

Upvotes: 1

vdep
vdep

Reputation: 3590

The reason why it throws error and other details are listed in same link that you have shared. Here is the modified version of stringify for array of map:

def stringify = udf((vs: Seq[Map[String, Int]]) => vs match {
  case null => null
  case x => "[" + x.flatMap(_.toList).mkString(",") + "]"
})

credits: link

Upvotes: 2

Related Questions