whoisalex
whoisalex

Reputation: 1

Functional programming in Scala: Output the word (or list of words) that occurs the most times in the text file?

Output the word (or list of words) that occurs the most times in the text file (irrespective of case – i.e. “word” and “Word” are treated the same for this purpose). We are only interested in words that contain alphabetic characters [A-Z a-z], so ignore any digits (numbers), punctuation, etc.

If there are several words that occur most often with equal frequency then all these words should be printed as a list. Alongside the word(s) you should output the number of occurrences. For example:

The word(s) that occur most often are [“and”, “it”, “the”] each with 10 occurrences in the text.

I have the following code:

val counter: Map[String, Int] = scala.io.Source.fromFile(file).getLines
      .flatMap(_.split("[^-A-Za-z]+")).foldLeft(Map.empty[String, Int]) {
      (count, word) => count + (word.toLowerCase -> (count.getOrElse(word, 0) + 1))
    }
    val list = counter.toList.sortBy(_._2).reverse

This goes as far as creating a list of the words in descending order of occurrences. I don't know how to proceed from here.

Upvotes: 0

Views: 46

Answers (1)

Dima
Dima

Reputation: 40500

Well, you are almost there ...

   val maxNum = counter.headOption.fold(0)(_._2) // What's the max number?
   list
     .iterator // not necessary, but makes it a bit faster to perform chained transformations
     .takeWhile(_._2 == maxNum) // Get all words that have that count
     .map(_._1) // drop the counts, keep only words
     .foreach(println) // Print them out

One kinda major problem with your solution is that you shouldn't sort the list just to find the maximum, as pointed out in the comment. Just do

    val maxNum = counter.maxByOption(_._2).fold(0)(_._2)
    counter
     .iterator
     .collect { case (w, `maxNum`) => w }
     .foreach(println)

Also, a bit of a "cosmetic" improvement to your counting is to use groupMapReduce that does what you've accomplished with foldLeft a bit more elegantly:

    val counter = source.getLines
        .flatMap("\\b") // \b is a regex symbol for "word boundary"
        .filter(_.contains("\\w")) // filter out the delimiters - you have a little bug here, that results in your counting spaces as "words"
        .groupMapReduce(identity)(_ => 1)(_ + _) // group data by word, replace each occurrence of a word with `1`, and add them all up

Upvotes: 1

Related Questions