Reputation: 73
I am a newbie in Spark/Scala and my problem statement is I have a dataframe like below:
Col1 | Col2
a 1
a 2
a 3
b 4
b 5
i want to create a map like this
a-> [1,2,3]
b-> [4,5]
I am facing issue in combining col2 values based on col1 value and then creating a map with key as col1 value.
Upvotes: 0
Views: 1138
Reputation: 27373
you can do it like this:
val df = Seq(
("a",1),
("a",2),
("a",3),
("b",4),
("b",5)
).toDF("col1","col2")
val map: Map[String, Seq[Int]] = df.groupBy($"col1")
.agg(collect_list($"col2"))
.as[(String,Seq[Int])]
.collect().toMap
gives
Map(b -> List(4, 5), a -> List(1, 2, 3))
But be aware that will blow up for large datasets
Upvotes: 0
Reputation: 49260
Use map
with collect_list
.
val aggdf = df.groupBy($"col1").agg(map($"col1",collect_list($"col2")).alias("mapped"))
aggdf.select($"mapped").show()
Upvotes: 1
Reputation: 1851
How about this:
val x = df.withColumn("x", array("col2"))
.groupBy("col1")
.agg(collect_list("x"))
x.show()
+----+---------------+
|col1|collect_list(x)|
+----+---------------+
| b| [[4], [5]]|
| a|[[1], [2], [3]]|
+----+---------------+
Not really as you wanted, but we are a step closer :)
Upvotes: 0