Devesh Kumar
Devesh Kumar

Reputation: 73

How to convert Dataframe to Map with key as one of column value?

I am a newbie in Spark/Scala and my problem statement is I have a dataframe like below:

Col1 | Col2
a      1
a      2
a      3
b      4
b      5

i want to create a map like this

a-> [1,2,3]
b-> [4,5]

I am facing issue in combining col2 values based on col1 value and then creating a map with key as col1 value.

Upvotes: 0

Views: 1138

Answers (3)

Raphael Roth
Raphael Roth

Reputation: 27373

you can do it like this:

val df = Seq(
  ("a",1),
  ("a",2),
  ("a",3),
  ("b",4),
  ("b",5)
).toDF("col1","col2")



val map: Map[String, Seq[Int]] = df.groupBy($"col1")
  .agg(collect_list($"col2"))
  .as[(String,Seq[Int])]
  .collect().toMap

gives

Map(b -> List(4, 5), a -> List(1, 2, 3))

But be aware that will blow up for large datasets

Upvotes: 0

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

Use map with collect_list.

val aggdf = df.groupBy($"col1").agg(map($"col1",collect_list($"col2")).alias("mapped"))
aggdf.select($"mapped").show()

Upvotes: 1

Sparker0i
Sparker0i

Reputation: 1851

How about this:

val x = df.withColumn("x", array("col2"))
    .groupBy("col1")
    .agg(collect_list("x"))

x.show()

+----+---------------+
|col1|collect_list(x)|
+----+---------------+
|   b|     [[4], [5]]|
|   a|[[1], [2], [3]]|
+----+---------------+

Not really as you wanted, but we are a step closer :)

Upvotes: 0

Related Questions