Scala: Functional replacement for iterators over variable sized batches

Question

I'm using Scala to read columns out of our column store Cassandra. Each column contains a number of entries, n, where n can be between 10 and 20. We read a batch of entries, ie 1000 at a time, and have to create columns from the entries; each entry has an ID attached that we can use to group-by.

Currently we use an iterator to go through the entries in a batch and find out if we're onto a new column by comparing the current and previous ID, and we read many batches till we're done. We need to store a partial column at the end of each batch iteration because the rest of the column will be in the next batch. I've put some pseudo code below to demonstrate the basic algorithm we currently employ.

How could do this in a functional way? (If n was constant this would be a simple problem as we could set the batch size appropriately.)

Pseudo code:

val resultBuffer // collects all columns
val columnBuffer // collects entries for current column
var currentId    // id of current column

while(batchIterator.hasNext){
     val batch = batchIterator.getNext
     val entryIterator = batch.entries.iterator

     while(entryIterator.hasNext){
           val entry = entryIterator.next
            if(entry.id != currentId) {
               currentId = entry.id  
               resultBuffer += columnBuilder(columnBuffer)
               columnBuffer.removeAll
               columnBuffer += entry
            } else {
                columnBuffer += entry
            } 
     }
}

Scala: Functional replacement for iterators over variable sized batches

Answers (1)

Related Questions