Reputation: 5200
What is the best way to find the most frequent/common element in a collection? For example:
list = List(1, 3, 4, 4, 2)
list.mostCommon // => 4 !! This is what I want !!
Hmm.. What one could do is to do a groupBy
first and then map
them by length
, and then select the biggest one. So then you would get:
Map(1 -> List(1), 4 -> List(4, 4), 3 -> List(3), 2 -> List(2))
(...)
Map(1 -> 1, 4 -> 2, 3 -> 1, 2 -> 1) // mapped by length. 4 -> 2 since there's two 4s
And then in the end, choose the key (4
) that maps to the highest number (2
). (nested question: what is the best way to do that?). But that seems like a lot of work for such a simple operation..?
Is there a better / more idiomatic way to do this?
Upvotes: 17
Views: 13315
Reputation: 61666
Starting in Scala 2.13
, we can use:
List::groupMapReduce
which is (as its name suggests) an equivalent of a groupBy
followed by mapValues
and a reduce step.Map::maxByOption
instead of a simple maxBy
to also handle empty lists:List(1, 3, 4, 4, 2, 3)
.groupMapReduce(identity)(_ => 1)(_+_).maxByOption(_._2).map(_._1)
// Option[Int] = Some(4)
This:
group
s items (group part of groupMapReduce)
map
s each grouped value occurrence to 1 (map part of groupMapReduce)
reduce
s values within a group of values (_ + _
) by summing them (reduce part of groupMapReduce).
finally gets the optional maximum by nbr of occurrences and map it to get the corresponding item.
If you know your list is not empty, then a simple maxBy
also works:
List(1, 3, 4, 4, 2, 3).groupMapReduce(identity)(_ => 1)(_+_).maxBy(_._2)._1
// 4
The groupMapReduce
part is an equivalent version performed in one pass through the sequence of:
List(1, 3, 4, 4, 2, 3).groupBy(identity).mapValues(_.map(_ => 1).reduce(_+_))
Upvotes: 3
Reputation: 25844
No, I think that's the best way. But it's not a lot of work...
list.groupBy(identity).mapValues(_.size)
gives you
Map(2 -> 1, 4 -> 2, 1 -> 2, 3 -> 1)
then, for instance, you can take its .maxBy(_._2)
(EDITED: thanks @Travis Brown!) and get a tuple (4,2)
(the number that occurs the most times and how many times it occurs)
If you're a fan of one-liners:
scala> List(1, 3, 4, 1, 4, 2).groupBy(identity).mapValues(_.size).maxBy(_._2)
res0: (Int, Int) = (4,2)
Upvotes: 1
Reputation: 15416
I don't think this is really any nicer, but you could do this:
List(1, 3, 4, 4, 2).groupBy(identity).maxBy(_._2.size)._1
Not the nicest solution. What you want is some way to use maxBy on the list and then reference the list like so:
val someList = List(1, 3, 4, 4, 2)
someList.maxBy(x => list.count(_ == x))
Upvotes: 2
Reputation: 3658
another variant:
val x = List(1, 3, 4, 1, 4, 2, 5, 5, 5)
x.distinct.foldLeft((0,0))((a, b) => {
val cnt = x.count(_ == b);
if (cnt > a._1) (cnt, b) else a
})._2
Upvotes: 0
Reputation: 139038
I have to say that:
list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1
Or just:
list.groupBy(identity).maxBy(_._2.size)._1
Doesn't really seem like that much work to me.
If you're worried about the overhead of building up the lists for each value when you only need counts, you could do the following:
list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1
Or even keep track of the maximum as you go, to avoid the extra traversal at the end:
list.foldLeft(
Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
case ((m, (maxV, maxCount)), v) =>
val count = m(v) + 1
if (count > maxCount) (m.updated(v, count), v -> count)
else (m.updated(v, count), maxV -> maxCount)
}._2._1
This is obviously much less readable than the one-liners above, though, so I'd recommend sticking with them unless you can show (i.e., with benchmarking, not speculation) that they're a bottleneck in your application.
Upvotes: 26