Reputation: 3109
I am processing huge amounts of xml documents and am wondering how to dynamically flush a sequence to disc when the memory footprint is to large.
The ByteArray of the xmls will vary huge. Sometimes I cannot run more than 2 documents before running out of memory, but most times it takes hundres of document.
Is it possible to accomplish the following without using for/while-loops?
val listOfDocumentIds: List<Long> = ...
listOfDocumentIds
.asSequence()
// ... read documents until limit of xml memory footprint are reached
// ... process read documents
// ... write processed documents to disc
// ... loop back to where reading halted and continue reading/processing
Upvotes: 0
Views: 52
Reputation: 3455
The answer is yes - except I don't know how you'd know when your memory limit is about to be reached.
fun processXmlDocs(listOfDocumentIds: List<Long>): Sequence<XmlDoc> = sequence {
listOfDocumentIds.forEach { docId ->
val doc = XmlDoc() // do some processing here
yield(doc)
}
}
fun main() {
val listOfDocumentIds: List<Long> = listOf(1,2,3)
val xmlDocsToPersist = mutableListOf<XmlDoc>() // this is the memory buffer
processXmlDocs(listOfDocumentIds).forEach { xmlDoc ->
xmlDocsToPersist.add(xmlDoc)
if( MEMORY_GETTING_FULL ) {
persistAll(xmlDocsToPersist)
xmlDocsToPersist.clear()
}
}
// drain down the last chunk
persistAll(xmlDocsToPersist)
}
class XmlDoc {}
Yes, this has a for loop, which I am sure you can simplify further, but like the above it conveys how to use Sequences alongside a forEach
which is considered a terminal operation - such operations actually cause the sequence to produce data.
Upvotes: 0