Reputation: 49
I am trying to read inputs from a file and counts them using a map.I want to ignore spaces when reading from the file.
val lines = Source.fromFile("file path","utf-8").getLines()
val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0)
lines.flatMap(line => line.split(" ")).foreach(word => counts(word) += 1)
for ((key, value) <- counts) println (key + "-->" + value)
When I try this code for the following input.
hello hello
world goodbye hello
world
the output is
world-->2
goodbye-->1
hello-->3
-->2
it counts 2 spaces. how can I fix that ?
Upvotes: 1
Views: 1042
Reputation: 20415
This approach extracts words from each line with "\\W+"
, regardless of the number of white spaces in between words,
Source.fromFile("filepath")
.getLines
.flatMap(_.trim.split("\\W+"))
.toArray.groupBy(identity)
.map ( kv => kv._1 -> kv._2.size )
Hence
res: Map(world -> 2, goodbye -> 1, hello -> 3)
Upvotes: 0
Reputation: 15783
Probably one way would be to use filter:
lines
.flatMap(line => line.split(" "))
.filter(_ != " ")
.foreach(word => counts(word) += 1)
Anyway I would say that there's a better approach, you could force the iterator to evaluate using the toList
method and then use groupBy
with collect
:
Iterator("some word", "some other")
.flatMap(_.split(" "))
.toList
.groupBy(identity)
.collect { case (a,b) if !a.isEmpty => (a, b.length)}
This outputs:
Map(some -> 2, word -> 1, other -> 1)
Note also that this approach is most probably less efficient than the one you are using because it creates many intermediate collections, I haven't done any benchmark on it, for large files it may be not the best option.
Upvotes: 1