Spark : How to group by distinct values in DataFrame

Question

I have a data in a file in the following format:

The code I am executing is following:

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 

import spark.implicits._
import sqlContext.implicits._

case class Person(a: Int, b: Int)

val ppl = sc.textFile("newfile.txt").map(_.split(","))
    .map(p=> Person(p(0).trim.toInt, p(1).trim.toInt))
    .toDF()
ppl.registerTempTable("people")

val result = ppl.select("a","b").groupBy('a).agg()
result.show

Expected Output is:

a 32, 33, 44, 23

b 21, 56

Instead of aggregation by sum, count, mean etc. I want every element in the row.

Spark : How to group by distinct values in DataFrame

Answers (1)

Related Questions