Ashwin
Ashwin

Reputation: 13547

Scala Iterator vs Other collections?

I am sifting through a large data set, parsing and grouping based on same keys. But to use groupBy function I need to convert my iterator to an Array. Why is groupBy not present in Iterator? I understand how an iterator works and that an iterator can iterate through the elements only once. But when you provide methods like map, filter, foreach etc on Iterator why not provide groupBy as well?
Is there any specific reason for this? Because converting an iterator to an Array takes more time when you work with large data.

Upvotes: 0

Views: 785

Answers (1)

Leo C
Leo C

Reputation: 22449

One approach to avoid loading the entire dataset into an Array or List from an Iterator is to use foldLeft to assemble the aggregated Map. Below is an example of computing the sum of values by key via foldLeft from an Iterator:

val it = Iterator(("a", 1), ("a", 2), ("b", 3), ("b", 4), ("c", 5))

it.foldLeft(Map.empty[String, Int]){ case (m, (k, v)) =>
  m + (k -> (m.getOrElse(k, 0) + v))
}
// res1: scala.collection.immutable.Map[String,Int] = Map(a -> 3, b -> 7, c -> 5)

Re: problem with groupBy on an Iterator, here's a relevant SO link and Scala-lang link.

Upvotes: 2

Related Questions