Dump array of map column of a spark dataframe into csv file

Question

I have the following spark dataframe and its corresponding schema

+----+--------------------+
|name|        subject_list|
+----+--------------------+
| Tom|[[Math -> 99], [P...|
| Amy|   [[Physics -> 77]]|
+----+--------------------+

root
 |-- name: string (nullable = true)
 |-- subject_list: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: integer (valueContainsNull = false)

How can I dump this dataframe into a csv file seperated by " " as following

Tom    [(Math, 99), (Physics, 88)]
Amy    [(Physics, 77)]

Here's link to a similar post to this question but it is for dumping an array of string, not an array of map.

Appreciate for any help, thanks.

vdep · Accepted Answer

The reason why it throws error and other details are listed in same link that you have shared. Here is the modified version of stringify for array of map:

def stringify = udf((vs: Seq[Map[String, Int]]) => vs match {
  case null => null
  case x => "[" + x.flatMap(_.toList).mkString(",") + "]"
})

credits: link

Dump array of map column of a spark dataframe into csv file

Answers (2)

Related Questions