Reputation: 273
I'm trying to find the best solution to convert an entire Spark dataframe to a scala Map collection. It is best illustrated as follows:
To go from this (in the Spark examples):
val df = sqlContext.read.json("examples/src/main/resources/people.json")
df.show
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
To a Scala collection (Map of Maps) represented like this:
val people = Map(
Map("age" -> null, "name" -> "Michael"),
Map("age" -> 30, "name" -> "Andy"),
Map("age" -> 19, "name" -> "Justin")
)
Upvotes: 15
Views: 29619
Reputation: 31
val map =df.collect.map(a=>(a(0)->a(1))).toMap.asInstanceOf[Map[String,String]]
if the result is needed in a map instead of array(map)
Upvotes: 1
Reputation: 7249
First get the schema from Dataframe
val schemaList = dataframe.schema.map(_.name).zipWithIndex//get schema list from dataframe
Get the rdd from dataframe and mapping with it
dataframe.rdd.map(row =>
//here rec._1 is column name and rce._2 index
schemaList.map(rec => (rec._1, row(rec._2))).toMap
).collect.foreach(println)
Upvotes: 5
Reputation: 13927
I don't think your question makes sense -- your outermost Map
, I only see you are trying to stuff values into it -- you need to have key / value pairs in your outermost Map
. That being said:
val peopleArray = df.collect.map(r => Map(df.columns.zip(r.toSeq):_*))
Will give you:
Array(
Map("age" -> null, "name" -> "Michael"),
Map("age" -> 30, "name" -> "Andy"),
Map("age" -> 19, "name" -> "Justin")
)
At that point you could do:
val people = Map(peopleArray.map(p => (p.getOrElse("name", null), p)):_*)
Which would give you:
Map(
("Michael" -> Map("age" -> null, "name" -> "Michael")),
("Andy" -> Map("age" -> 30, "name" -> "Andy")),
("Justin" -> Map("age" -> 19, "name" -> "Justin"))
)
I'm guessing this is really more what you want. If you wanted to key them on an arbitrary Long
index, you can do:
val indexedPeople = Map(peopleArray.zipWithIndex.map(r => (r._2, r._1)):_*)
Which gives you:
Map(
(0 -> Map("age" -> null, "name" -> "Michael")),
(1 -> Map("age" -> 30, "name" -> "Andy")),
(2 -> Map("age" -> 19, "name" -> "Justin"))
)
Upvotes: 31