Reputation: 13547
I am sifting through a large data set, parsing and grouping based on same keys. But to use groupBy function I need to convert my iterator to an Array
. Why is groupBy not present in Iterator
? I understand how an iterator works and that an iterator can iterate through the elements only once. But when you provide methods like map
, filter
, foreach
etc on Iterator why not provide groupBy
as well?
Is there any specific reason for this? Because converting an iterator to an Array takes more time when you work with large data.
Upvotes: 0
Views: 785
Reputation: 22449
One approach to avoid loading the entire dataset into an Array or List from an Iterator is to use foldLeft
to assemble the aggregated Map
. Below is an example of computing the sum of values by key via foldLeft
from an Iterator:
val it = Iterator(("a", 1), ("a", 2), ("b", 3), ("b", 4), ("c", 5))
it.foldLeft(Map.empty[String, Int]){ case (m, (k, v)) =>
m + (k -> (m.getOrElse(k, 0) + v))
}
// res1: scala.collection.immutable.Map[String,Int] = Map(a -> 3, b -> 7, c -> 5)
Re: problem with groupBy
on an Iterator, here's a relevant SO link and Scala-lang link.
Upvotes: 2