tryingToLearn
tryingToLearn

Reputation: 11659

How optimized are Java 8 stream filters over collection methods?

For example I have a comma separated string:

String multiWordString= "... , ... , ... ";

And I want to check whether another string str is present in the csv string. Then I can do following 2 things:

1.

boolean contains = Arrays.asList(multiWordString.split(",")).contains(str);

2.

boolean contains = Arrays.asList(multiWordString.split(",")).stream().filter(e -> e.equals(str)).findFirst();

EDIT: The sample string happens to use comma as a delimiter. I should have used the better name for sample string to avoid confusion. I updated the name. In this question I am trying to find the performance difference between using Java 8 streams and loops/collection methods.

Upvotes: 4

Views: 386

Answers (3)

Eugene
Eugene

Reputation: 120848

Without tests it's impossible to tell, details internally can change of how one solutions of another acts, so the best way is to measure. It is know though that streams are a bit slower - they do have an infrastructure behind them...

Here is a naive simple test (with little data):

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class CSVParsing {
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(CSVParsing.class.getSimpleName())
                .jvmArgs("-ea")
                .shouldFailOnError(true)
                .build();
        new Runner(opt).run();
    }

    @Param(value = { "a,e, b,c,d",
            "a,b,c,d, a,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,d, e",
            "r, m, n, t,r, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, t, e" })
    String csv;

    @Fork(1)
    @Benchmark
    public boolean containsSimple() {
        return Arrays.asList(csv.split(",")).contains("e");
    }

    @Fork(1)
    @Benchmark
    public boolean containsStream() {
        return Arrays.asList(csv.split(",")).stream().filter(e -> e.equals("e")).findFirst().isPresent();
    }

    @Fork(1)
    @Benchmark
    public boolean containsStreamParallel() {
        return Arrays.asList(csv.split(",")).stream().filter(e -> e.equals("e")).findFirst().isPresent();
    }
}

Even if you don't understand the code the results are simple numbers that you can compare:

 CSVParsing.containsSimple   (first Parameter)    181.201 ±   5.390
 CSVParsing.containsStream                        255.851 ±   5.598
 CSVParsing.containsStreamParallel                295.296 ±  57.800

I am not going to show the rest of the results (for other parameters) since they are in the same range.

Bottom line is they do differ, by up to 100 ns; let me re-iterate that: nano-seconds.

There is a difference indeed; but if you really honestly care about this diff, then csv parsing is probably the wrong choice in the first place.

Upvotes: 6

GhostCat
GhostCat

Reputation: 140427

The "amount" of work is the same in both versions: a list needs to be created and all elements need to be compared.

Option 2 adds the overhead of setting up a lot of things behind the covers. Thus option 2 draws significantly more CPU cycles compared to option 1. Streams do not come for free!

Efficiency has different aspects here. When your csv input is relatively simple, a regular expression might do fine (to check if some pattern could be found in the csv string. And when you have to deal with arbitrary csv input (for example with values that contain quoted commas) then that simple split by comma will lead to incorrect results anyway.

Upvotes: 2

Ryan Leach
Ryan Leach

Reputation: 4470

Watch out, CSVs are generally more complex then just comma's seperating strings, theres escaping comma's to worry about as well. I hope this is either an example or not a CSV format being imported.

You shouldn't convert from an array to a list first, go straight from the array to the stream using Arrays.stream or Stream.of()

But the streams are lazy, and they only do as much work as they need to do.

.contains(str) will abort as soon as it finds a match.

It's hard to tell performance without measuring it, so for now make the program correct and easy to maintain.

If performance is a concern, after you have some amount done, profile and see what bits could be better, try alternatives, then pick the winner.

Upvotes: 2

Related Questions