Pete Montgomery
Pete Montgomery

Reputation: 4100

Scala Seq.grouped eating my iterator

As a C# programmer I have a sketchy understanding of Java / Scala iterator design.

I am trying to (lazily - for the source may be big) read records from a RecordReader (in some third party library). I need to do some additional work every 100 records.

for (group <- reader.iterator.zipWithIndex.grouped(100)) {
  for ((record, i) <- group) {
    println(i + "|" + record.key)
  }
  // ...
}

This gives me the very last record, repeatedly, each time.

If I don't use grouped, it works fine and I get each record. Am I missing something about lazy streaming or Java iterators?

Upvotes: 3

Views: 2141

Answers (2)

R&#233;gis Jean-Gilles
R&#233;gis Jean-Gilles

Reputation: 32719

I think the problem might be that Record.key just returns the current value of some variable that is mutated as the iterator is consumed (as opposed to having the record to actually capture the key value at construction time). An example will prbably make it clearer. First, let's use the scala REPL to cook up some test code that not exhibit the problem:

case class Record( key: Int )
def getRecordIterator: Iterator[Record] = {
  var currentKey: Int = 0
  (1 to 10).iterator.map{ i => 
    currentKey += 1
    new Record( currentKey )
  }
}

Then we can try to iterate without using grouped:

for ((record, i) <- getRecordIterator.zipWithIndex) {
  println(i + "|" + record)
}

This gives us (as expected)

0|Record(1)
1|Record(2)
2|Record(3)
3|Record(4)
4|Record(5)
5|Record(6)
6|Record(7)
7|Record(8)
8|Record(9)
9|Record(10)

And then using grouped:

for (group <- getRecordIterator.zipWithIndex.grouped(3)) {
  for ((record, i) <- group) {
    println(i + "|" + record)
  }
  println("---")
}

Which gives:

0|Record(1)
1|Record(2)
2|Record(3)
---
3|Record(4)
4|Record(5)
5|Record(6)
---
6|Record(7)
7|Record(8)
8|Record(9)
---
9|Record(10)
---    

Until now, all is well.

Now let's change the definition of Record slightly:

trait Record {
  def key: Int
  override def toString = "Record(" + key + ")"
}
def getRecordIterator: Iterator[Record] = {
  var currentKey: Int = 0
  (1 to 10).iterator.map{ i => 
    currentKey += 1
    new Record{ def key = currentKey }
  }    
}

With this change, we still have the same result when not using grouped, but here is what we get when we do use group:

0|Record(3)
1|Record(3)
2|Record(3)
---
3|Record(6)
4|Record(6)
5|Record(6)
---
6|Record(9)
7|Record(9)
8|Record(9)
---
9|Record(10)
---

The source of the problem is that the mere fact of calling next on our iterator mutates the value that is returned by Record.get. The problem can be illustrated even more trivially:

val it = getRecordIterator
val r1 = it.next
println(r1) // prints "Record(1)" as expected
val r2 = it.next
println(r2) // prints "Record(2)" as expected
println(r1) // this now prints "Record(2)", not "Record(1)" anymore!

Upvotes: 3

huynhjl
huynhjl

Reputation: 41646

To troubleshoot, try to decorate your iterator in an another iterator that prints what is going on:

def wrap[T](i: Iterator[T]) = new Iterator[T] {
  def hasNext = { val b = i.hasNext; println("hasNext => " + b); b }
  def next() = { val n = i.next(); println("next() => " + n); n }
}

val reader = Iterator.from(20).take(10).toList
for (group <- wrap(reader.iterator).zipWithIndex.grouped(5)) {
  for ((v, i) <- group) println("[" + i + "] = " + v)
}

Call wrap on the iterator, the very first time you instantiate the iterator. This will print something like:

hasNext => true
hasNext => true
next() => 20
hasNext => true
next() => 21
hasNext => true

This should help you determine if the iterator is ill behaved... It could be for instance that the library does not deal correctly with calling hasNext multiple times without calling next. In that case you can modify wrap so that you make the iterator behave correctly. One more thing, from the symptoms, it feels like you've already consume the iterator before the grouped is called. So be extra careful and check if you've used the same iterator reference before.

Upvotes: 3

Related Questions