Reputation: 1391
I'm asking this question solely to better understand how kotlin sequences work. I thought I had a solid grasp, but I cannot explain what I observed in a short test by what I know, so obviously I have a misconception somewhere.
My goal was to do a quick benchmark to compare the performance of lists vs. sequences when filtering for a criteria and then taking the maximum value of the result. This is an operation that occurs fairly often in some code I have, and I'm trying to decide whether or not it's worth rewriting it to use sequences instead of lists. It seems it would be, as sequence is consistently faster, but that is not the question here.
Rather, I would ask you to explain to me how the below described "artifact" can come about.
First of all, here's the complete test I ran:
fun `just checking the performance of sequences`() {
val log = logger()
var totaldif = 0L
var seqFasterThanList = 0
(1..1000).forEach {
val testList = (1..6000000).toList().shuffled()
val testSequence = testList.asSequence()
// filter and find max
val listDuration = measureTimeMillis {
testList.filter { it % 2 == 0 }.max()
}
val sequenceDuration = measureTimeMillis {
testSequence.filter { it % 2 == 0 }.max()
}
log.info("List: {} ms; Sequence: {} ms;", listDuration, sequenceDuration)
if (sequenceDuration < listDuration) {
seqFasterThanList++
totaldif += (listDuration - sequenceDuration)
}
}
log.info("sequence was faster {} out of 1000 runs. average difference: {} ms",
seqFasterThanList, totaldif / seqFasterThanList)
}
The results mostly looked like this:
List: 299 ms; Sequence: 217 ms;
List: 289 ms; Sequence: 211 ms;
List: 288 ms; Sequence: 220 ms;
Except, every once in a while, about 1 in 20, the result looked more like this:
List: 380 ms; Sequence: 63 ms
As you can see, in these cases the operation was vastly faster. This is the kind of behaviour I would expect on operations like find or first, which can terminate early once they find a match. But by its very nature, max has to traverse the entire sequence to guarantee the result. So how is it possible that some times it can find a result more than 3 times as fast as it usually requires, with the same number of elements to traverse?
Upvotes: 0
Views: 221
Reputation: 7016
Further down is my original answer which, as @Slaw pointed out, wasn't actually answering what you asked (it was explaining why Sequence.filter
is faster than Iterable.filter
, not why Sequence.filter
seems to be intermittently faster than it normally is). However, I'm leaving it below as it's related to what I think might be the answer to your actual question.
My guess is this might be related to garbage collection. As you can see from my original answer, when you call Iterable.filter
you are causing lots of arrays to be copied, i.e. you're putting lots of stuff in memory, which has to be cleaned up at certain points. I wonder if it's this cleanup of stuff in memory created by the List
tests, which is actually causing the anomalies. I think what might be happening is that every so often the garbage collector kicks in and does a full collection: this is causing the List
test to slow down to slower than normal. And after this runs the memory is all cleaned up, which might be why the Sequence
test is faster that time.
And the reason I suspect it's related to garbage collection is because I replicated your anomalies, then made one change: instead of calling testList.filter
I call testList.filterTo
, passing in an ArrayList
of the same size as the list. That means that no array copying has to happen, and also the creation of the ArrayList
is now outside of the timing:
val arrayList = ArrayList<Int>(testList.size)
val listDuration = measureTimeMillis {
testList.filterTo(arrayList) { it % 2 == 0 }.max()
}
As soon as I did that, the anomalies disappeared. Maybe you can check on your system and see if this makes the anomalies disappear there too. It's intermittent so a bit difficult to know for sure.
This doesn't prove that it's garbage collection, of course, but I think it makes it a possible culprit. You could turn on GC logging to see if you wanted to know for sure. If you do, let us know what you find: it would be interesting to hear your results.
Original answer below (explaining why Iterable.filter
is slower than Sequence.filter
)
If you look at the source code for Iterable<T>.filter
you'll see it does this:
public inline fun <T> Iterable<T>.filter(predicate: (T) -> Boolean): List<T> {
return filterTo(ArrayList<T>(), predicate)
}
It creates a new ArrayList
then loops round the items, checking the predicate against each one, and adding them to that array list if they match the predicate. This means that every X items (whatever the array list's default size is), the array list has to resize itself to allow more items in (i.e. create a new copy of the underlying array in which it's storing all its data).
In a sequence, however, the code is different:
public fun <T> Sequence<T>.filter(predicate: (T) -> Boolean): Sequence<T> {
return FilteringSequence(this, true, predicate)
}
Here there isn't some underlying array storing all the items, so no copying of arrays has to take place. Instead, there's just an Iterator
which will return the next item which matches the predicate whenever next
is called.
You can see the details of how this is implemented in the FilteringSequence
class:
internal class FilteringSequence<T>(
private val sequence: Sequence<T>,
private val sendWhen: Boolean = true,
private val predicate: (T) -> Boolean
) : Sequence<T> {
override fun iterator(): Iterator<T> = object : Iterator<T> {
val iterator = sequence.iterator()
var nextState: Int = -1 // -1 for unknown, 0 for done, 1 for continue
var nextItem: T? = null
private fun calcNext() {
while (iterator.hasNext()) {
val item = iterator.next()
if (predicate(item) == sendWhen) {
nextItem = item
nextState = 1
return
}
}
nextState = 0
}
override fun next(): T {
if (nextState == -1)
calcNext()
if (nextState == 0)
throw NoSuchElementException()
val result = nextItem
nextItem = null
nextState = -1
@Suppress("UNCHECKED_CAST")
return result as T
}
Upvotes: 4