Reputation: 1
Output the word (or list of words) that occurs the most times in the text file (irrespective of case – i.e. “word” and “Word” are treated the same for this purpose). We are only interested in words that contain alphabetic characters [A-Z a-z], so ignore any digits (numbers), punctuation, etc.
If there are several words that occur most often with equal frequency then all these words should be printed as a list. Alongside the word(s) you should output the number of occurrences. For example:
The word(s) that occur most often are [“and”, “it”, “the”] each with 10 occurrences in the text.
I have the following code:
val counter: Map[String, Int] = scala.io.Source.fromFile(file).getLines
.flatMap(_.split("[^-A-Za-z]+")).foldLeft(Map.empty[String, Int]) {
(count, word) => count + (word.toLowerCase -> (count.getOrElse(word, 0) + 1))
}
val list = counter.toList.sortBy(_._2).reverse
This goes as far as creating a list of the words in descending order of occurrences. I don't know how to proceed from here.
Upvotes: 0
Views: 46
Reputation: 40500
Well, you are almost there ...
val maxNum = counter.headOption.fold(0)(_._2) // What's the max number?
list
.iterator // not necessary, but makes it a bit faster to perform chained transformations
.takeWhile(_._2 == maxNum) // Get all words that have that count
.map(_._1) // drop the counts, keep only words
.foreach(println) // Print them out
One kinda major problem with your solution is that you shouldn't sort the list just to find the maximum, as pointed out in the comment. Just do
val maxNum = counter.maxByOption(_._2).fold(0)(_._2)
counter
.iterator
.collect { case (w, `maxNum`) => w }
.foreach(println)
Also, a bit of a "cosmetic" improvement to your counting is to use groupMapReduce
that does what you've accomplished with foldLeft
a bit more elegantly:
val counter = source.getLines
.flatMap("\\b") // \b is a regex symbol for "word boundary"
.filter(_.contains("\\w")) // filter out the delimiters - you have a little bug here, that results in your counting spaces as "words"
.groupMapReduce(identity)(_ => 1)(_ + _) // group data by word, replace each occurrence of a word with `1`, and add them all up
Upvotes: 1