pcpp
pcpp

Reputation: 19

Most frequent value in a dataset in scala

How can I find the most frequent value in a specific column in a dataset in scala?

For example, if one of the columns is like this:

Seattle
Barcelona
Lisbon
Barcelona
Montreal
Barcelona
Lisbon

I would need to get "Barcelona" as a result.

Upvotes: 0

Views: 440

Answers (2)

werner
werner

Reputation: 14845

If you are looking for a Spark based solution, this is the same idea like Jack Koenig's answer but using Spark functions instead of the Scala ones:

val df = List(
  "Seattle",
  "Barcelona",
  "Lisbon",
  "Barcelona",
  "Montreal",
  "Barcelona",
  "Lisbon"
).toDF("city")

val max = df
  .groupBy("city")
  .count()
  .sort(desc("count"))
  .head()
  .getString(0)

Upvotes: 1

Jack Koenig
Jack Koenig

Reputation: 6064

Turning C.S.Reddy's comment into a complete answer:

Scastie link: https://scastie.scala-lang.org/5GIgNMJGTuCVDYrsBa33eg

val xs = List(
  "Seattle",
  "Barcelona",
  "Lisbon",
  "Barcelona",
  "Montreal",
  "Barcelona",
  "Lisbon"
)

val result = 
  xs.groupBy(x => x)
    .map { case (k, v) => k -> v.size }
    .maxBy(_._2)
    ._1

println(result)
// Barcelona

Upvotes: 0

Related Questions