Reputation: 19
How can I find the most frequent value in a specific column in a dataset in scala?
For example, if one of the columns is like this:
Seattle
Barcelona
Lisbon
Barcelona
Montreal
Barcelona
Lisbon
I would need to get "Barcelona" as a result.
Upvotes: 0
Views: 440
Reputation: 14845
If you are looking for a Spark based solution, this is the same idea like Jack Koenig's answer but using Spark functions instead of the Scala ones:
val df = List(
"Seattle",
"Barcelona",
"Lisbon",
"Barcelona",
"Montreal",
"Barcelona",
"Lisbon"
).toDF("city")
val max = df
.groupBy("city")
.count()
.sort(desc("count"))
.head()
.getString(0)
Upvotes: 1
Reputation: 6064
Turning C.S.Reddy's comment into a complete answer:
Scastie link: https://scastie.scala-lang.org/5GIgNMJGTuCVDYrsBa33eg
val xs = List(
"Seattle",
"Barcelona",
"Lisbon",
"Barcelona",
"Montreal",
"Barcelona",
"Lisbon"
)
val result =
xs.groupBy(x => x)
.map { case (k, v) => k -> v.size }
.maxBy(_._2)
._1
println(result)
// Barcelona
Upvotes: 0