Reputation: 810
I have one RDD which contains multiple datastructures, whereas one of these data structures is a Map[String, Int]
.
To visualize it easily I get the following after a map transformation:
val data = ... // This is a RDD[Map[String, Int]]
In one of the elements of this RDD, the Map contains the following:
*key value*
map_id -> 7753
Oscar -> 39
Jaden -> 13
Thomas -> 1
Chris -> 52
And then it contains other names and numbers in other elements of the RDD, each map contains a certain map_id
. Anyhow, if I simply do data.saveAsTextFile(path)
, I will get the following output in my file:
Map(map_id -> 7753, Oscar -> 39, Jaden -> 13, Thomas -> 1, Chris -> 52)
Map(...)
Map(...)
However, I would like to format it as the following:
---------------------------
map_id: 7753
---------------------------
Oscar: 39
Jaden: 13
Thomas: 1
Chris: 52
---------------------------
map_id: <some other id>
---------------------------
Name: nbr
Name2: nbr2
Basically, the map_id
as some kind of header, then the contents, one line of space and then the next element.
To my question, data
RDD only has two options, save as text file or as object file, which neither as far as I can see support my to customize the formatting. How could I go about doing this?
Upvotes: 0
Views: 1659
Reputation: 35229
You can just map
to String
and write the result. For example:
def format(map: Map[String, Int]): String = {
val id = map.get("map_id").map(_.toString).getOrElse("unknown")
val content = map.collect {
case (k, v) if k != "map_id" => s"$k: $v"
}.mkString("\n")
s"""|---------------------------
|map_id: $id
|-------------------------------
|$content
""".stripMargin
}
data.map(format(_)).saveAsTextFile(path)
Upvotes: 4