Reputation: 792
I have a source file which contains words and want to do typical word count, I am using something which converts to Array and takes into memory
def freqMap(lines: Iterator[String]): Map[String, Int] = {
val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))
val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }
frequencies
}
But I want to evaluate line by line and show output as every line is processed. How can this be done lazily and without putting everything into memory
Upvotes: 0
Views: 320
Reputation: 1
I think what you're looking for are scanLeft method. So example solution might look like this:
val iter = List("this is line number one", "this is line number two", "this this this").toIterator
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
It's all lazy and pull based, if you execute val solution = iter.flatMap(_.split(" ")).scanLeftMap[String, Int]{ case (acc, word) => println(word) acc.updated(word, acc.getOrElse(word, 0) + 1) }
println(solution.take(3).toList)
this will get printed to the console:
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))
Upvotes: 0
Reputation: 51271
You say you don't want to put everything in memory, but you want to "show output as every line is processed." That sounds like you just want to println
the intermediate results.
lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
println(mp) // output intermediate results
line.split(" ").foldLeft(mp){ case (m,word) =>
m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
}
}
The iterator (lines
) is consumed one at a time. The Map
result is built word-by-word and carried forward line-by-line as the foldLeft
accumulator.
Upvotes: 1