Dhrumil Shah
Dhrumil Shah

Reputation: 33

Spark How to add value in a Hashmap from RDD?

I have below Data frame

val df = phDF.groupBy("name").agg(collect_list("message").as("Messages"))

I got below output

+-----------+--------------------+
|name       |Messages            |
+-----------+--------------------+
|     Test1 |['A','B','C']       |
|     Test2 |['A','B','C','D']   |
|     Test3 |['A','B']           |
+-----------+--------------------+

Now I want to add above name (as a Key) and Message (as a value) into a Hashmap.

I have used below approach to convert it into RDD but not getting any clue

var m = scala.collection.mutable.Map[String, String]()
val rdd = df.rdd.map(_.mkString("##"))
val rdd1 = rdd.map(s=>s.split("##"))
val rdd2 = rdd1.map(ele=>m.put(ele(0),ele(1)))
print(m)   // Output:- HashMap()

As above when I try to print hashMap then I am getting blank

Does anyone can help me how could I store this value in HashMap as below like?

Map("Test1" -> "['A','B','C']" ,"Test2" -> "['A','B','C','D']","Test3" -> "['A','B']")

Upvotes: 0

Views: 202

Answers (1)

Jarrod Baker
Jarrod Baker

Reputation: 1220

Given your initial data:

val df = Seq(
  ("test1", Seq("A", "B", "C")),
  ("test2", Seq("A", "B", "C", "D")),
).toDF("name", "Messages")

You can convert it into a map with the map_from_entries method:

val asMapDf = df.select(
  map_from_entries(
    array(
      struct("name", "Messages")
    )
  )
)

Note you create an array of struct items with two columns. Each entry in the array becomes an entry in the map. This gives you:

+-----------------------+
|map                    |
+-----------------------+
|{test1 -> [A, B, C]}   |
|{test2 -> [A, B, C, D]}|
+-----------------------+

Upvotes: 1

Related Questions