jdk2588
jdk2588

Reputation: 792

Do word count by taking input from file line by line in Scala?

I have a source file which contains words and want to do typical word count, I am using something which converts to Array and takes into memory

def freqMap(lines: Iterator[String]): Map[String, Int] = {

   val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))

   val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }

   frequencies
}

But I want to evaluate line by line and show output as every line is processed. How can this be done lazily and without putting everything into memory

Upvotes: 0

Views: 320

Answers (2)

ssn
ssn

Reputation: 1

I think what you're looking for are scanLeft method. So example solution might look like this:

val iter = List("this is line number one", "this is line number two", "this this this").toIterator

  val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
    case (acc, word) =>
      println(word)
      acc.updated(word, acc.getOrElse(word, 0) + 1)
  }

It's all lazy and pull based, if you execute val solution = iter.flatMap(_.split(" ")).scanLeftMap[String, Int]{ case (acc, word) => println(word) acc.updated(word, acc.getOrElse(word, 0) + 1) }

println(solution.take(3).toList) this will get printed to the console:

  val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
  println(word)
  acc.updated(word, acc.getOrElse(word, 0) + 1)

}

this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))

Upvotes: 0

jwvh
jwvh

Reputation: 51271

You say you don't want to put everything in memory, but you want to "show output as every line is processed." That sounds like you just want to println the intermediate results.

lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
  println(mp)  // output intermediate results
  line.split(" ").foldLeft(mp){ case (m,word) =>
      m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
  }
}

The iterator (lines) is consumed one at a time. The Map result is built word-by-word and carried forward line-by-line as the foldLeft accumulator.

Upvotes: 1

Related Questions