Reputation: 267
I have a very long collection I need to iterate over in scala and I want to avoid keeping it all in memory. The solution I came up with is this:
(rows is the iterator I am trying to process and COMPONENT_LIMIT the estimate of how many objects I calculate I can keep in memory)
val ( processItr, countItr ) = rows.duplicate
val pastLimitItr = countItr.drop( COMPONENT_LIMIT )
if ( pastLimitItr.hasNext )
new CustomIterator( processItr.buffered)
else
Iterator( MappperToObject.createObject(
processItr.toList
) )
The problem I have is this: even though I do not need to use the pastLimitItr any more, as far as I can tell from scala source on def duplicate the queue will hang around so the memory used will be relative to the length of the iterator.
The question is: how can I get rid of the queue in the Partner object in def duplicate after I am done with the test? I do not need the duplicate at all after the test.
UPDATE: I should have added that the output iterator objects will contain some of the objects in the input iterator based on their content, so I cannot use grouped as suggested.
UPDATE: It looks like span is the right answer out of the options given in the answer. I was probably not specific enough in my question.
Upvotes: 2
Views: 482
Reputation: 39587
It sounds like you want to use:
val segments = iterator.grouped(LIMIT)
createObject(segments.next())
Though you if you did need duplicate
, you could drain the duplicates.
You can also use iterator.span
with a condition that counts:
scala> val it = (1 to 10).iterator
it: Iterator[Int] = non-empty iterator
scala> var n = 0 ; val (vs, rest) = it.span { _ => n += 1; n < 3 }
n: Int = 0
vs: Iterator[Int] = non-empty iterator
rest: Iterator[Int] = unknown-if-empty iterator
scala> vs.toList
res0: List[Int] = List(1, 2)
scala> rest.toList
res1: List[Int] = List(3, 4, 5, 6, 7, 8, 9, 10)
You could define that as Iterator::splitAt
:
scala> implicit class splitItAt[A](it: Iterator[A]) {
| def splitAt(i: Int): (Iterator[A], Iterator[A]) = {
| var n = 0
| it.span { _ => n += 1; n <= i }
| }}
defined class splitItAt
scala> val (is, rest) = (1 to 10).iterator.splitAt(6)
is: Iterator[Int] = non-empty iterator
rest: Iterator[Int] = unknown-if-empty iterator
scala> is.toList
res2: List[Int] = List(1, 2, 3, 4, 5, 6)
But I see you actually want to use either the prefix or the remaining iterator.
I'd write a custom method. Or don't laugh:
scala> val (is, rest) = (1 to 10).iterator.splitAt(6)
is: Iterator[Int] = non-empty iterator
rest: Iterator[Int] = unknown-if-empty iterator
scala> is match { case it: collection.Iterator$Leading$1 if rest.hasNext => it.finish() ; rest ; case _ => is }
res6: Iterator[Int] = unknown-if-empty iterator
scala> res6.next
res7: Int = 7
That internal finish
means you can use the rest
without buffering the prefix.
And you can also cheat grouped
, as implemented, and use the original iterator for rest
:
scala> val it = (1 to 10).iterator
it: Iterator[Int] = non-empty iterator
scala> val g = it.grouped(3)
g: it.GroupedIterator[Int] = non-empty iterator
scala> val first = g.next
first: List[Int] = List(1, 2, 3)
scala> it.hasNext
res12: Boolean = true
scala> it.next
res13: Int = 4
The custom method with no internals to hold onto:
scala> :pa
// Entering paste mode (ctrl-D to finish)
implicit class splitItAt[A](private val it: Iterator[A]) extends AnyVal {
def splitAt(i: Int): (List[A], Iterator[A]) = {
val buf = mutable.ListBuffer.empty[A]
var n = 0
while (it.hasNext && n < i) {
buf += it.next()
n += 1
}
(buf.toList, it)
}
}
// Exiting paste mode, now interpreting.
defined class splitItAt
scala> val (is, rest) = (1 to 10).iterator.splitAt(20)
is: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
rest: Iterator[Int] = empty iterator
scala> val (is, rest) = (1 to 10).iterator.splitAt(6)
is: List[Int] = List(1, 2, 3, 4, 5, 6)
rest: Iterator[Int] = non-empty iterator
scala> val (is, rest) = (1 to 10).iterator.splitAt(0)
is: List[Int] = List()
rest: Iterator[Int] = non-empty iterator
Upvotes: 2