Jerome ROBERT
Jerome ROBERT

Reputation: 337

Populating a map in a loop [Scala]

I’m afraid this is another noob question.

What I want to do is to use a Map in order to count how often a word appears in a poe…m and then print the results to the console. I went to the following code which I believe is working (while probably not quite idiomatic):

val poe_m="""Once upon a midnight dreary, while I pondered weak and weary,
            |Over many a quaint and curious volume of forgotten lore,
            |While I nodded, nearly napping, suddenly there came a tapping,
            |As of some one gently rapping, rapping at my chamber door.
            |`'Tis some visitor,' I muttered, `tapping at my chamber door -
            |Only this, and nothing more.'"""

val separators=Array(' ',',','.','-','\n','\'','`')
var words=new collection.immutable.HashMap[String,Int]
for(word<-poe_m.stripMargin.split(separators) if(!word.isEmpty))  
    words=words+(word.toLowerCase -> (words.getOrElse(word.toLowerCase,0)+1))

words.foreach(entry=>println("Word : "+entry._1+" count : "+entry._2))

As far as I understand, in Scala, immutable data structures are preferred to mutable ones and val preferable to varso I’m facing a dilemma : words should be a var (allowing a new instance of map to be used for each iteration) if results are to be stored in an immutable Map while turning words into a val implies to use a mutable Map.

Could someone enlighten me about the proper way to deal with this existential problem?

Upvotes: 4

Views: 3929

Answers (5)

Don Mackenzie
Don Mackenzie

Reputation: 7963

Credit lies elsewhere (Travis and Daniel in particular) for what follows but there was a simpler one liner needing to get out.

val words = poe_m split "\\W+" groupBy identity mapValues {_.size}

There's a simplification in that you won't need stripMargin because the regex, as suggested by Daniel disposes of the margin characters as well.

You could retain the _.isEmpty filtering to protect against the edge case for the empty String which yields ("" -> 1) if you want.

Upvotes: 1

sunsations
sunsations

Reputation: 369

This is how this is done in the very good book "Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition" by Martin Odersky:

def countWords(text: String) = {
  val counts = mutable.Map.empty[String, Int]
  for (rawWord <- text.split("[ ,!.]+")) {
    val word = rawWord.toLowerCase
    val oldCount = 
      if (counts.contains(word)) counts(word)
      else 0
    counts += (word -> (oldCount + 1))
  }
  counts
}

However, it also uses an mutable Map.

Upvotes: 1

Eduardo
Eduardo

Reputation: 8412

I am a noob with Scala too, so, there may be better ways to do it. I have come up with the following:

poe_m.stripMargin.split(separators)
     .filter(x => !x.isEmpty)
     .groupBy(x => x).foreach {
        case(w,ws) => println(w + " " + ws.size)
     }

By applying successive functions, you avoid the need for vars and mutables

Upvotes: 2

Christopher Chiche
Christopher Chiche

Reputation: 15345

Well, in functional programming it is preferred to use some immutable objects and to use functions to update them (for example a tail recursive function returning the updated map). However, if you are not dealing with heavy loads, you should prefer the mutable map to the use of var, not because it is more powerful (even if I think it should be) but because it is easier to use.

Finally the answer of Travis Brown is a solution for your concrete problem, mine is more a personal philosophy.

Upvotes: 2

Travis Brown
Travis Brown

Reputation: 139058

In this case you can use groupBy and mapValues:

val tokens = poe_m.stripMargin.split(separators).filterNot(_.isEmpty)
val words = tokens.groupBy(w => w).mapValues(_.size)

More generally this is a job for a fold:

 val words = tokens.foldLeft(Map.empty[String, Int]) {
   case (m, t) => m.updated(t, m.getOrElse(t, 0) + 1)
 }

The Wikipedia entry on folds gives some good clarifying examples.

Upvotes: 10

Related Questions