Reputation: 33
I have below Data frame
val df = phDF.groupBy("name").agg(collect_list("message").as("Messages"))
I got below output
+-----------+--------------------+
|name |Messages |
+-----------+--------------------+
| Test1 |['A','B','C'] |
| Test2 |['A','B','C','D'] |
| Test3 |['A','B'] |
+-----------+--------------------+
Now I want to add above name (as a Key) and Message (as a value) into a Hashmap.
I have used below approach to convert it into RDD but not getting any clue
var m = scala.collection.mutable.Map[String, String]()
val rdd = df.rdd.map(_.mkString("##"))
val rdd1 = rdd.map(s=>s.split("##"))
val rdd2 = rdd1.map(ele=>m.put(ele(0),ele(1)))
print(m) // Output:- HashMap()
As above when I try to print hashMap then I am getting blank
Does anyone can help me how could I store this value in HashMap as below like?
Map("Test1" -> "['A','B','C']" ,"Test2" -> "['A','B','C','D']","Test3" -> "['A','B']")
Upvotes: 0
Views: 202
Reputation: 1220
Given your initial data:
val df = Seq(
("test1", Seq("A", "B", "C")),
("test2", Seq("A", "B", "C", "D")),
).toDF("name", "Messages")
You can convert it into a map with the map_from_entries
method:
val asMapDf = df.select(
map_from_entries(
array(
struct("name", "Messages")
)
)
)
Note you create an array of struct
items with two columns. Each entry in the array becomes an entry in the map. This gives you:
+-----------------------+
|map |
+-----------------------+
|{test1 -> [A, B, C]} |
|{test2 -> [A, B, C, D]}|
+-----------------------+
Upvotes: 1