Naveen
Naveen

Reputation: 473

Efficient way to iterate collection objects in Groovy 2.x?

What is the best and fastest way to iterate over Collection objects in Groovy. I know there are several Groovy collection utility methods. But they use closures which are slow.

Upvotes: 1

Views: 1698

Answers (1)

Szymon Stepniak
Szymon Stepniak

Reputation: 42184

The final result in your specific case might be different, however benchmarking 5 different iteration variants available for Groovy shows that old Java for-each loop is the most efficient one. Take a look at the following example where we iterate over 100 millions of elements and we calculate the total sum of these numbers in the very imperative way:

@Grab(group='org.gperfutils', module='gbench', version='0.4.3-groovy-2.4')

import java.util.concurrent.atomic.AtomicLong
import java.util.function.Consumer

def numbers = (1..100_000_000)

def r = benchmark {
    'numbers.each {}' {
        final AtomicLong result = new AtomicLong()
        numbers.each { number -> result.addAndGet(number) }
    }

    'for (int i = 0 ...)' {
        final AtomicLong result = new AtomicLong()
        for (int i = 0; i < numbers.size(); i++) {
            result.addAndGet(numbers[i])
        }
    }

    'for-each' {
        final AtomicLong result = new AtomicLong()
        for (int number : numbers) {
            result.addAndGet(number)
        }
    }

    'stream + closure' {
        final AtomicLong result = new AtomicLong()
        numbers.stream().forEach { number -> result.addAndGet(number) }
    }

    'stream + anonymous class' {
        final AtomicLong result = new AtomicLong()
        numbers.stream().forEach(new Consumer<Integer>() {
            @Override
            void accept(Integer number) {
                result.addAndGet(number)
            }
        })
    }
}
r.prettyPrint()

This is just a simple example where we try to benchmark the cost of iteration over a collection, no matter what the operation executed for every element from collection is (all variants use the same operation to give the most accurate results). And here are results (time measurements are expressed in nanoseconds):

Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
    * JRE: 1.8.0_181
    * Total Memory: 236 MB
    * Maximum Memory: 3497 MB
* OS: Linux (4.18.9-100.fc27.x86_64, amd64)

Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On

WARNING: Timed out waiting for "numbers.each {}" to be stable
                                user    system         cpu        real

numbers.each {}           7139971394  11352278  7151323672  7246652176
for (int i = 0 ...)       6349924690   5159703  6355084393  6447856898
for-each                  3449977333    826138  3450803471  3497716359
stream + closure          8199975894    193599  8200169493  8307968464
stream + anonymous class  3599977808   3218956  3603196764  3653224857

Conclusion

  1. Java's for-each is as fast as Stream + anonymous class (Groovy 2.x does not allow using lambda expressions).
  2. The old for (int i = 0; ... is almost twice slower comparing to for-each - most probably because there is an additional effort of returning a value from the array at given index.
  3. Groovy's each method is a little bit faster then stream + closure variant, and both are more than twice slower comparing to the fastest one.

It's important to run benchmarks for a specific use case to get the most accurate answer. For instance, Stream API will be most probably the best choice if there are some other operations applied next to the iteration (filtering, mapping etc.). For simple iterations from the first to the last element of a given collection choosing old Java for-each might give the best results, because it does not produce much overhead.

Also - the size of collection matters. For instance, if we use the above example but instead of iterating over 100 millions of elements we would iterate over 100k elements, then the slowest variant would cost 0.82 ms versus 0.38 ms. If you build a system where every nanosecond matters then you have to pick the most efficient solution. But if you build a simple CRUD application then it doesn't matter if iteration over a collection takes 0.82 or 0.38 milliseconds - the cost of database connection is at least 50 times bigger, so saving approximately 0.44 milliseconds would not make any impact.

// Results for iterating over 100k elements
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
    * JRE: 1.8.0_181
    * Total Memory: 236 MB
    * Maximum Memory: 3497 MB
* OS: Linux (4.18.9-100.fc27.x86_64, amd64)

Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On

                            user  system     cpu    real

numbers.each {}           717422       0  717422  722944
for (int i = 0 ...)       593016       0  593016  600860
for-each                  381976       0  381976  387252
stream + closure          811506    5884  817390  827333
stream + anonymous class  408662    1183  409845  416381

UPDATE: Dynamic invocation vs static compilation

There is also one more factor worth taking into account - static compilation. Below you can find results for 10 millions element collection iterations benchmark:

Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
    * JRE: 1.8.0_181
    * Total Memory: 236 MB
    * Maximum Memory: 3497 MB
* OS: Linux (4.18.10-100.fc27.x86_64, amd64)

Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On

                                   user   system        cpu       real

Dynamic each {}               727357070        0  727357070  731017063
Static each {}                141425428   344969  141770397  143447395
Dynamic for-each              369991296   619640  370610936  375825211
Static for-each                92998379    27666   93026045   93904478
Dynamic for (int i = 0; ...)  679991895  1492518  681484413  690961227
Static for (int i = 0; ...)   173188913        0  173188913  175396602

As you can see turning on static compilation (with @CompileStatic class annotation for instance) is a game changer. Of course Java for-each is still the most efficient, however its static variant is almost 4 times faster than the dynamic one. Static Groovy each {} is faster 5 times faster than the dynamic each {}. And static for loop is also 4 times faster then the dynamic for loop.

Conclusion - for 10 millions elements static numbers.each {} takes 143 milliseconds while static for-each takes 93 milliseconds for the same size collection. It means that for collection of size 100k static numbers.each {} will cost 0.14 ms and static for-each will take 0.09 ms approximately. Both are very fast and the real difference starts when the size of collection explodes to +100 millions of elements.

Java stream from Java compiled class

And to give you a perspective - here is Java class with stream().forEach() on 10 millions of elements for a comparison:

Java stream.forEach()          87271350   160988   87432338   88563305

Just a little bit faster than statically compiled for-each in Groovy code.

Upvotes: 4

Related Questions