Johnny Everson
Johnny Everson

Reputation: 8601

How to remove duplicates from a list then sort by most frequent

I have a list with assorted keywords that may repeat. I need to generate a list with distinct keywords but sorted by the frequency of which they appeared on the original list.

How would be the idiomatic Scala for that? Here is a working but ugly implementation:

val keys = List("c","a","b","b","a","a")
keys.groupBy(p => p).toList.sortWith( (a,b) => a._2.size > b._2.size ).map(_._1)
// List("a","b","c")

Upvotes: 3

Views: 2015

Answers (8)

sibaisi
sibaisi

Reputation: 1

My prefered versions would be:

Most canonical / expressive?

keys.groupBy(identity).toList.map{ case (k,v) => (-v.size,k)  }.sorted.map(_._2)

Shortest and probably most efficient?

keys.groupBy(identity).toList.sortBy(-_._2.size).map(_._1)

Straight forward

keys.groupBy(identity).values.toList.sortBy(-_.size).map(_.head)

Upvotes: 0

Eastsun
Eastsun

Reputation: 18859

Just a little change from @Daniel 's 4th version, may have a better performance:

scala> def sortByFreq[T](xs: List[T]): List[T] = {
     |   val freq = collection.mutable.Map.empty[T, Int] withDefaultValue 0
     |   xs foreach (k => freq(k) -= 1)
     |   xs.distinct sortBy freq
     | }
sortByFreq: [T](xs: List[T])List[T]

scala> sortByFreq(keys)
res2: List[String] = List(a, b, c)

Upvotes: 0

trenobus
trenobus

Reputation: 236

How about:

keys.distinct.sorted

Newbie didn't read the question carefully. Let me try again:

keys.foldLeft (Map[String,Int]()) { (counts, elem) => counts + (elem -> (counts.getOrElse(elem, 0) - 1))}
    .toList.sortBy(_._2).map(_._1)

Could use a mutable Map if you prefer. Negative frequency counts are stored in the map. If that bothers you, you can make them positive and negate the sortBy argument.

Upvotes: 0

Shrey
Shrey

Reputation: 2404

Perhaps,

val mapCount = keys.map(x => (x,keys.count(_ == x))).distinct
// mapCount  : List[(java.lang.String, Int)] = List((c,1), (a,3), (b,2))

val sortedList = mapCount.sortWith(_._2 > _._2).map(_._1)
// sortedList  : List[java.lang.String] = List(a, b, c)

Upvotes: 0

Sergey Passichenko
Sergey Passichenko

Reputation: 6930

fold version:

val keys = List("c","a","b","b","a","a")

val keysCounts =
    (Map.empty[String, Int] /: keys) { case (counts, k) =>
        counts updated (k, (counts getOrElse (k, 0)) + 1)
    }

keysCounts.toList sortBy { case (_, count) => -count } map { case (w, _) => w }

Upvotes: 1

Daniel C. Sobral
Daniel C. Sobral

Reputation: 297205

Shorter version:

keys.distinct.sortBy(keys count _.==).reverse

That is not particular efficient, however. The groupBy version ought to perform better, though it can be improved:

keys.groupBy(identity).toSeq.sortBy(_._2.size).map(_._1)

One can also get rid of the reverse in the first version by declaring an Ordering:

val ord = Ordering by (keys count (_: String).==)
keys.distinct.sorted(ord.reverse)

Note that reverse in this version just produces a new Ordering that works in the opposite manner of the original. This version also suggests a way to get better performance:

val freq = collection.mutable.Map.empty[String, Int] withDefaultValue 0
keys foreach (k => freq(k) += 1)
val ord = Ordering by freq
keys.distinct.sorted(ord.reverse)

Upvotes: 8

Denis Tulskiy
Denis Tulskiy

Reputation: 19177

Here's my take, don't know if it's less "ugly":

scala> keys.groupBy(p => p).values.toList.sortBy(_.size).reverse.map(_.head)
res39: List[String] = List(a, b, c)

Upvotes: 2

Richard Sitze
Richard Sitze

Reputation: 8463

Nothing wrong with that implementation that comments can't fix! Seriously, break it down a bit and describe what & why you're taking each step.

Not as "concise" perhaps, but the purpose of concise code in scala is to make code more readable. When concise code is not clear it's time to back up, break up (introduce well named local variables), and comment.

Upvotes: 2

Related Questions