Reputation: 8601
I have a list with assorted keywords that may repeat. I need to generate a list with distinct keywords but sorted by the frequency of which they appeared on the original list.
How would be the idiomatic Scala for that? Here is a working but ugly implementation:
val keys = List("c","a","b","b","a","a")
keys.groupBy(p => p).toList.sortWith( (a,b) => a._2.size > b._2.size ).map(_._1)
// List("a","b","c")
Upvotes: 3
Views: 2015
Reputation: 1
My prefered versions would be:
Most canonical / expressive?
keys.groupBy(identity).toList.map{ case (k,v) => (-v.size,k) }.sorted.map(_._2)
Shortest and probably most efficient?
keys.groupBy(identity).toList.sortBy(-_._2.size).map(_._1)
Straight forward
keys.groupBy(identity).values.toList.sortBy(-_.size).map(_.head)
Upvotes: 0
Reputation: 18859
Just a little change from @Daniel 's 4th version, may have a better performance:
scala> def sortByFreq[T](xs: List[T]): List[T] = {
| val freq = collection.mutable.Map.empty[T, Int] withDefaultValue 0
| xs foreach (k => freq(k) -= 1)
| xs.distinct sortBy freq
| }
sortByFreq: [T](xs: List[T])List[T]
scala> sortByFreq(keys)
res2: List[String] = List(a, b, c)
Upvotes: 0
Reputation: 236
How about:
keys.distinct.sorted
Newbie didn't read the question carefully. Let me try again:
keys.foldLeft (Map[String,Int]()) { (counts, elem) => counts + (elem -> (counts.getOrElse(elem, 0) - 1))}
.toList.sortBy(_._2).map(_._1)
Could use a mutable Map if you prefer. Negative frequency counts are stored in the map. If that bothers you, you can make them positive and negate the sortBy argument.
Upvotes: 0
Reputation: 2404
Perhaps,
val mapCount = keys.map(x => (x,keys.count(_ == x))).distinct
// mapCount : List[(java.lang.String, Int)] = List((c,1), (a,3), (b,2))
val sortedList = mapCount.sortWith(_._2 > _._2).map(_._1)
// sortedList : List[java.lang.String] = List(a, b, c)
Upvotes: 0
Reputation: 6930
fold version:
val keys = List("c","a","b","b","a","a")
val keysCounts =
(Map.empty[String, Int] /: keys) { case (counts, k) =>
counts updated (k, (counts getOrElse (k, 0)) + 1)
}
keysCounts.toList sortBy { case (_, count) => -count } map { case (w, _) => w }
Upvotes: 1
Reputation: 297205
Shorter version:
keys.distinct.sortBy(keys count _.==).reverse
That is not particular efficient, however. The groupBy
version ought to perform better, though it can be improved:
keys.groupBy(identity).toSeq.sortBy(_._2.size).map(_._1)
One can also get rid of the reverse
in the first version by declaring an Ordering
:
val ord = Ordering by (keys count (_: String).==)
keys.distinct.sorted(ord.reverse)
Note that reverse
in this version just produces a new Ordering
that works in the opposite manner of the original. This version also suggests a way to get better performance:
val freq = collection.mutable.Map.empty[String, Int] withDefaultValue 0
keys foreach (k => freq(k) += 1)
val ord = Ordering by freq
keys.distinct.sorted(ord.reverse)
Upvotes: 8
Reputation: 19177
Here's my take, don't know if it's less "ugly":
scala> keys.groupBy(p => p).values.toList.sortBy(_.size).reverse.map(_.head)
res39: List[String] = List(a, b, c)
Upvotes: 2
Reputation: 8463
Nothing wrong with that implementation that comments can't fix! Seriously, break it down a bit and describe what & why you're taking each step.
Not as "concise" perhaps, but the purpose of concise code in scala is to make code more readable. When concise code is not clear it's time to back up, break up (introduce well named local variables), and comment.
Upvotes: 2