Reputation: 464

How Stream is more efficient?

I am trying to digest Stream package and seems like it's very difficult for me to understand.

I was reading Stream package documentation and at a point I tried to implement it to learn by doing. This is the text I have read:

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

I understand this much that they provide a new Stream, so my first question is, Is creating a stream without traversing a heavy operation?

Now, since intermediate operations are lazy and terminal operations are eager and also streams are meant to be efficient than old programming standards of if-else and more readable.

Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state. Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)

To demonstrate this, I started implemented a small program to understand the concept. Here is the program:

        List<String> stringList = new ArrayList<>();
        for (int i = 0; i < 10000; i++) {
            stringList.add("String" + i);
        }
        long   start  = System.currentTimeMillis();
        Stream stream = stringList.stream().filter(s -> s.contains("99"));
        long   midEnd = System.currentTimeMillis();
        System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
        System.out.println(stream.findFirst().get());
        long end = System.currentTimeMillis();
        System.out.println("Whole time in millis: " + (end - start));
        System.out.println("Time in millis for Terminal operation: " + (end - midEnd));

        start = System.currentTimeMillis();
        for (String ss1 : stringList) {
            if (ss1.contains("99")) {
                System.out.println(ss1);
                break;
            }
        }
        end = System.currentTimeMillis();
        System.out.println("Time in millis with old standard: " + (end - start));

I have executed this program many times and each time it has proved me that, creating a new stream from intermediate operations is the heavy task to do. Terminal operations do take very little time as compared to intermediate operations.

And overall, old if-else pattern is way more efficient than streams. So, again more questions here:

Did I misunderstand something?
If I understand correct, why and when to use streams?
If I am doing or understanding anything wrong, can you please help clarify my concepts Package java.util.stream?

Actual Numbers:

Try 1:

Time is millis before applying terminal operation: 73
String99
Whole time in millis: 76
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0

Try 2:

Time is millis before applying terminal operation: 56
String99
Whole time in millis: 59
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0

Try 3:

Time is millis before applying terminal operation: 69
String99
Whole time in millis: 72
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0

These are my machine details if this help:

Memory: 11.6 GiB
Processor: Intel® Core™ i7-3632QM CPU @ 2.20GHz × 8 
OS-Type: 64-bit

Upvotes: 2

Answers (4)

Hank D

Reputation: 6491

One of the rationales for the Stream api is that it eliminates the inherent assumption of the for loop, that all iteration happens in the same way. When you use an iterator-based for loop, you are hard-coding the iteration logic to always iterate sequentially. Consider the question, "what if I wanted to change the implementation of the 'for' loop with something more efficient?"

The Stream api addresses that--it abstracts the notion of iteration and allows other ways of processing multiple data points to be considered -- iterate serially vs. in parallel, add optimizations if it is known that the data is unordered, etc.

Consider your example--although you can't change the implementation of the for loop, you can change the implementation of the Stream to suit different situations. For example, if you have more cpu-intensive operations to do on each task, you might choose a parallel Stream. Here's an example with 10 ms delays simulate more complex processing, done in parallel, with very different results:

    List<String> stringList = new ArrayList<>();
    for (int i = 0; i < 10000; i++) {
        stringList.add("String" + i);
    }
    long   start  = System.currentTimeMillis();
    Stream stream = stringList.parallelStream().filter(s -> {
        try {
            Thread.sleep(10);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return s.contains("99" );});
    long   midEnd = System.currentTimeMillis();
    System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
    System.out.println(stream.findAny().get());
    long end = System.currentTimeMillis();
    System.out.println("Whole time in millis: " + (end - start));
    System.out.println("Time in millis for Terminal operation: " + (end - midEnd));

    start = System.currentTimeMillis();
    for (String ss1 : stringList) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        if (ss1.contains("99")) {
            System.out.println(ss1);
            break;
        }
    }
    end = System.currentTimeMillis();
    System.out.println("Time in millis with old standard: " + (end - start));

I kept the same benchmark logic everyone is complaining about, to make it easier for you to compare.

As you can see, there are situations where for loops will always be more efficient than using a Stream, but Streams offer significant advantages in certain situations as well. It would be unwise to extrapolate from one isolated test that one approach is always better than the other--that is an axiom for life as well.

Upvotes: 4

a better oliver

Reputation: 26848

As others have already have noted your benchmark is flawed. The main problem is that the results are skewed by ignoring compilation time. Try the following:

    Stream stream = stringList.stream().filter(s -> s.contains("99"));
    long   start  = System.currentTimeMillis();
    stream = stringList.stream().filter(s -> s.contains("99"));
    long   midEnd = System.currentTimeMillis();

Now the code that backs filter is already compiled and the second call is fast. Even this would work:

    Stream stream = stringList.stream().map(s -> s);
    long   start  = System.currentTimeMillis();
    stream = stringList.stream().filter(s -> s.contains("99"));
    long   midEnd = System.currentTimeMillis();

map shares most of the code with filter, so calling filter is fast here, too, because the code is already compiled. And in case you ask: Calling filter or map on a different stream would work too, of course.

Your "old style" code doesn't require additional compilation.

Upvotes: 2

Eugene

Reputation: 121048

Unless your tests involve JMH, then your code is pretty much a proof of nothing and even worse, it will give an ALTERED impression of reality.

assylias made the comment that should make it clear on what goes wrong.

Also your measurements of the "intermediate operation" and then the "short circuit" are also wrong. The intermediate operation, because it is lazy, does nothing really, it will only take place when a terminal one will kick in.

If you ever worked with guava, this is how transform/filter is done in their code also, at least logically.

Upvotes: 3

Sleiman Jneidi

Reputation: 23349

I really don't trust your "benchmark", because too many things can go wrong, you better use a framework. But anyways, when people or docs say it is more efficient they don't mean the example you provided.

Streams as lifted collection (they don't hold data) are more efficient than eager ones like Scala Lists for instance where a filter allocates a new List and the map transforms the results to a new List.

When we compare with this implementation Streams win. But yeah streams allocate objects which is vey cheap on modern JVMs and looked after in modern GC's.

Upvotes: 2

How Stream is more efficient?

Answers (4)

Related Questions