kornfridge
kornfridge

Reputation: 5200

Finding the most frequent/common element in a collection?

What is the best way to find the most frequent/common element in a collection? For example:

list = List(1, 3, 4, 4, 2)
list.mostCommon   // => 4        !! This is what I want !!

Hmm.. What one could do is to do a groupBy first and then map them by length, and then select the biggest one. So then you would get:

Map(1 -> List(1), 4 -> List(4, 4), 3 -> List(3), 2 -> List(2))
(...)
Map(1 -> 1, 4 -> 2, 3 -> 1, 2 -> 1)  // mapped by length. 4 -> 2  since there's two 4s

And then in the end, choose the key (4) that maps to the highest number (2). (nested question: what is the best way to do that?). But that seems like a lot of work for such a simple operation..?

Is there a better / more idiomatic way to do this?

Upvotes: 17

Views: 13315

Answers (5)

Xavier Guihot
Xavier Guihot

Reputation: 61666

Starting in Scala 2.13, we can use:

  • List::groupMapReduce which is (as its name suggests) an equivalent of a groupBy followed by mapValues and a reduce step.
  • Map::maxByOption instead of a simple maxBy to also handle empty lists:
List(1, 3, 4, 4, 2, 3)
  .groupMapReduce(identity)(_ => 1)(_+_).maxByOption(_._2).map(_._1)
// Option[Int] = Some(4)

This:

  • groups items (group part of groupMapReduce)

  • maps each grouped value occurrence to 1 (map part of groupMapReduce)

  • reduces values within a group of values (_ + _) by summing them (reduce part of groupMapReduce).

  • finally gets the optional maximum by nbr of occurrences and map it to get the corresponding item.


If you know your list is not empty, then a simple maxBy also works:

List(1, 3, 4, 4, 2, 3).groupMapReduce(identity)(_ => 1)(_+_).maxBy(_._2)._1
// 4

The groupMapReduce part is an equivalent version performed in one pass through the sequence of:

List(1, 3, 4, 4, 2, 3).groupBy(identity).mapValues(_.map(_ => 1).reduce(_+_))

Upvotes: 3

Paolo Falabella
Paolo Falabella

Reputation: 25844

No, I think that's the best way. But it's not a lot of work...

list.groupBy(identity).mapValues(_.size)

gives you

Map(2 -> 1, 4 -> 2, 1 -> 2, 3 -> 1)

then, for instance, you can take its .maxBy(_._2) (EDITED: thanks @Travis Brown!) and get a tuple (4,2) (the number that occurs the most times and how many times it occurs)

If you're a fan of one-liners:

scala> List(1, 3, 4, 1, 4, 2).groupBy(identity).mapValues(_.size).maxBy(_._2)
res0: (Int, Int) = (4,2)

Upvotes: 1

Dominic Bou-Samra
Dominic Bou-Samra

Reputation: 15416

I don't think this is really any nicer, but you could do this:

List(1, 3, 4, 4, 2).groupBy(identity).maxBy(_._2.size)._1

Not the nicest solution. What you want is some way to use maxBy on the list and then reference the list like so:

val someList = List(1, 3, 4, 4, 2)
someList.maxBy(x => list.count(_ == x))

Upvotes: 2

hahn
hahn

Reputation: 3658

another variant:

val x = List(1, 3, 4, 1, 4, 2, 5, 5, 5)
 x.distinct.foldLeft((0,0))((a, b) => {
     val cnt = x.count(_ == b);
   if (cnt > a._1) (cnt, b) else a
 })._2

Upvotes: 0

Travis Brown
Travis Brown

Reputation: 139038

I have to say that:

list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1

Or just:

list.groupBy(identity).maxBy(_._2.size)._1

Doesn't really seem like that much work to me.

If you're worried about the overhead of building up the lists for each value when you only need counts, you could do the following:

list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
  case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1

Or even keep track of the maximum as you go, to avoid the extra traversal at the end:

list.foldLeft(
  Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
  case ((m, (maxV, maxCount)), v) =>
    val count = m(v) + 1
    if (count > maxCount) (m.updated(v, count), v -> count)
    else (m.updated(v, count), maxV -> maxCount)
}._2._1

This is obviously much less readable than the one-liners above, though, so I'd recommend sticking with them unless you can show (i.e., with benchmarking, not speculation) that they're a bottleneck in your application.

Upvotes: 26

Related Questions