Reputation: 1619
I just came across a question when using a List
and its stream()
method. While I know how to use them, I'm not quite sure about when to use them.
For example, I have a list, containing various paths to different locations. Now, I'd like to check whether a single, given path contains any of the paths specified in the list. I'd like to return a boolean
based on whether or not the condition was met.
This of course, is not a hard task per se. But I wonder whether I should use streams, or a for(-each) loop.
The List
private static final List<String> EXCLUDE_PATHS = Arrays.asList(
"my/path/one",
"my/path/two"
);
Example using Stream:
private boolean isExcluded(String path) {
return EXCLUDE_PATHS.stream()
.map(String::toLowerCase)
.filter(path::contains)
.collect(Collectors.toList())
.size() > 0;
}
Example using for-each loop:
private boolean isExcluded(String path){
for (String excludePath : EXCLUDE_PATHS) {
if (path.contains(excludePath.toLowerCase())) {
return true;
}
}
return false;
}
Note that the path
parameter is always lowercase.
My first guess is that the for-each approach is faster, because the loop would return immediately, if the condition is met. Whereas the stream would still loop over all list entries in order to complete filtering.
Is my assumption correct? If so, why (or rather when) would I use stream()
then?
Upvotes: 124
Views: 60011
Reputation: 22369
Radical answer:
Never. Ever. Ever.
I almost never iterated a list for anything, especially to find something, yet stream users and systems seem filled with that way of coding.
I find it difficult to refactor and organize such code and I see redundancy and over iteration everywhere in stream heavy systems. In the same method you might see it 5 times. Same list, finding different things.
It is also not really shorter either. Rarely is. Definitely not more readable but that is a subjective opinion. Some people will say it is. I don't. People might like it due to autocompletion but in my editor Intellij, I can just iter
or itar
and have the for loop auto created for me with types and everything.
Often misused and overused, and I think it is better to avoid it completely. Java is not a true functional language and Java generics suck and are not expressive enough, and certainly more difficult to read, parse and refactor. Just try to visit any of the native Java stream libraries. Do you find that easy to parse?
Also, stream code is not easily extractable or refactorable unless you want to start adding weird methods that return Optionals
, Predicates
, Consumers
and what not and you end up having methods returning and taking all kinds of weird generic constraints with orders and meanings only God knows what.
Too much is inferred where you need to visit methods to figure out the types of various things.
Trying to make Java behave like a functional language like Haskell or Lisp is a fools errand. A heavy streams based Java system is always going to be more complex than a none one and way less performant and more complex to refactor and maintain.
Thus also more buggy and filled with patch work. Glue work everywhere due to the redundancy often filled in such systems. Some people just don't have an issue with redundancy. I am not one of them. Nor should you be either.
When OpenJDK got involved they started adding things to the language without really thinking it thoroughly enough. It is now not just Java Streams which is an issue. Now systems are inherently more complex because they require more base knowledge of these API's. You might have it, but your colleagues don't. They sure as hell know what a for loop is and what an if block is.
Furthermore, since you also can not assign anything to a non final variable you can rarely do two things at the same while looping, so you end up iterating twice, or thrice.
Most that like and prefer the stream approach over a for loop are most likely people that started learning Java post Java 8. Those before hate it. The thing is that it is far more complex to use, refactor and more difficult to use the right way. It requires skills to not fuck up, and then even more skills and energy to repair fuck ups.
And when I say it performs worse, it is not in comparison to a for loop, which is also a very real thing but more due to the tendency such code have to over iterate a wide range of things. It is deemed so easy to iterate a list to find an item that it tends being done over and over again.
I've not seen a single system that has benefitted from it. All of the systems I have seen are horribly implemented, mostly because of it, and I've worked in some of the biggest companies in the world.
Code is definitely not more readable than a for loop and a for loop is definitely more flexible and refactorable. The reason we see so many complex shitty systems and bugs everywhere today is, I promise you due to the heavy reliance on streams to filter, not to mention the accompanied overuse of Lombok and Jackson. Those three are the hallmark of a badly implemented system. Keyword overuse. A patch work approach.
Again, I consider it really bad to iterate a list to find anything. Yet with Stream based systems, this is what people do all the time. It is also not rare and difficult to parse and detect that an iteration might be O(N2) while with a for loop you would immediately see it.
What is often customary to ask the database to filter things for you it is now not rare that instead a base query instead return a big list of things with all kind of iterative logic and methods to filter out the undesirables and of course they use streams to do this. All kinds of methods arises around that big list with various things to filter out things.
Often redundant filtering and thus logic too. Over and over again.
Of course, I do not mean you. But your colleagues. Right?
Personally, I rarely ever iterate anything. I use the right datasets and rely on the database to filter it for me. Once. However in a streams heavy system you will see iteration everywhere.
In the deepest method, in the caller, caller of caller, caller of the caller of the caller. Streams everywhere. It is ugly. And good luck refactoring that code that lives in tiny lambdas. And good luck reusing them. Nobody will look to reuse your nice Predicates.
And if they want to use them, guess what? They need to use more Streams. You just got yourself addicted and cornered yourself further. Now, are you proposing I start splitting all of my code in tiny Predicates, Consumers, Function and BiFcuntions? Just so I can reuse that logic for Streams?
Of course I hate it just as much in Javascript as well where over iteration is everywhere by noob frontend developers.
You might say the cost is nothing to iterate a list but the system complexity grows, redundancy increases and therefore maintenance costs and number of bugs increases. It becomes a patch and glue based approach to various things. Just add another filter and remove this, rather than code things the right way.
Furthermore, where you need three servers to host all of your users, I can manage with just one. So required scalability of such a system is going to be required way earlier than a non streams heavy system. For small projects that is a very important metric. Where you can have say 5000 concurrent users, my system can handle twice or thrice that.
I have no need for it in my code, and when I am in charge of new projects, the first rule is that streams are totally forbidden to use.
That is not to say there are not use cases for it or that it might be useful at times but the risks associated with allowing it far outweighs the benefits.
When you start using Streams you are essentially adopting a whole new programming paradigm. The entire programming style of the system will change and that is what I am concerned about.
You do not want that style. It is not superior to the old style. Especially on Java.
Take the Futures API as an example.
Sure, you could start coding everything to return a Promise or a Future, but do you really want to? Is that going to resolve anything? Can your entire system really follow up on being that, everywhere?
Will it be better for you, or are you just experimenting and hoping you will benefit at some point?
There are people that overdo JavaRx and overdo promises in JavaScript as well. There are really really few cases for when you really want to have things futures based and very many many corner cases will be felt where you will find that those APIs have certain limitations and you just got made.
You can build really really complex and far far more maintainable systems without all that crap.
This is what it is about. It is not about your hobby project expanding and becoming a horrible code base.
It is about what is best approach to build large and complex enterprise systems and ensure they remain coherent, consistent refactorable, and easily maintainable.
Furthermore, rarely are you ever working on such systems on your own.
You are very likely working with a minimum of > 10 people all experimenting and overdoing Streams.
So while you might know how to use them properly you can rest assure the other 9 really don't. They just love experimenting and learning by doing.
I will leave you with these wonderful examples of real code, with thousands of more similar to them:
Or this:
Or this:
Or this:
Try refactoring any of the above. I challenge you. Give it a try. Everything is a Stream, everywhere. This is what Stream developers do, they overdo it, and there is no easy way to grasp what the code is actually doing. What is this method returning, what is this transformation doing, what do I end up with. Everything is inferred. Much more difficult to read for sure.
If you understand this, then you must be the einstein, but you should know not everyone is like you, and this could be your system in a very near future.
Do note, this is not isolated to this one project but I've seen many of them very similar to these structures.
One thing is for sure, horrible coders love streams.
Upvotes: 9
Reputation: 1975
Your assumption is correct. Your stream implementation is slower than the for-loop.
This stream usage should be as fast as the for-loop though:
EXCLUDE_PATHS.stream()
.map(String::toLowerCase)
.anyMatch(path::contains);
This iterates through the items, applying String::toLowerCase
and the filter to the items one-by-one and terminating at the first item that matches.
Both collect()
& anyMatch()
are terminal operations. anyMatch()
exits at the first found item, though, while collect()
requires all items to be processed.
Upvotes: 100
Reputation: 45
As others have mentioned many good points, but I just want to mention lazy evaluation in stream evaluation. When we do map()
to create a stream of lower case paths, we are not creating the whole stream immediately, instead the stream is lazily constructed, which is why the performance should be equivalent to the traditional for loop. It is not doing a full scanning, map()
and anyMatch()
are executed at the same time. Once anyMatch()
returns true, it will be short-circuited.
Upvotes: 1
Reputation: 2116
Yeah. You are right. Your stream approach will have some overhead. But you may use such a construction:
private boolean isExcluded(String path) {
return EXCLUDE_PATHS.stream().map(String::toLowerCase).anyMatch(path::contains);
}
The main reason to use streams is that they make your code simpler and easy to read.
Upvotes: 22
Reputation: 527
The goal of streams in Java is to simplify the complexity of writing parallel code. It's inspired by functional programming. The serial stream is just to make the code cleaner.
If we want performance we should use parallelStream, which was designed to. The serial one, in general, is slower.
There is a good article to read about ForLoop
, Stream
and ParallelStream
Performance.
In your code we can use termination methods to stop the search on the first match. (anyMatch...)
Upvotes: 14
Reputation: 298233
The decision whether to use Streams or not should not be driven by performance consideration, but rather by readability. When it really comes to performance, there are other considerations.
With your .filter(path::contains).collect(Collectors.toList()).size() > 0
approach, you are processing all elements and collecting them into a temporary List
, before comparing the size, still, this hardly ever matters for a Stream consisting of two elements.
Using .map(String::toLowerCase).anyMatch(path::contains)
can save CPU cycles and memory, if you have a substantially larger number of elements. Still, this converts each String
to its lowercase representation, until a match is found. Obviously, there is a point in using
private static final List<String> EXCLUDE_PATHS =
Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
.collect(Collectors.toList());
private boolean isExcluded(String path) {
return EXCLUDE_PATHS.stream().anyMatch(path::contains);
}
instead. So you don’t have to repeat the conversion to lowcase in every invocation of isExcluded
. If the number of elements in EXCLUDE_PATHS
or the lengths of the strings becomes really large, you may consider using
private static final List<Predicate<String>> EXCLUDE_PATHS =
Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
.map(s -> Pattern.compile(s, Pattern.LITERAL).asPredicate())
.collect(Collectors.toList());
private boolean isExcluded(String path){
return EXCLUDE_PATHS.stream().anyMatch(p -> p.test(path));
}
Compiling a string as regex pattern with the LITERAL
flag, makes it behave just like ordinary string operations, but allows the engine to spent some time in preparation, e.g. using the Boyer Moore algorithm, to be more efficient when it comes to the actual comparison.
Of course, this only pays off if there are enough subsequent tests to compensate the time spent in preparation. Determining whether this will be the case, is one of the actual performance considerations, besides the first question whether this operation will ever be performance critical at all. Not the question whether to use Streams or for
loops.
By the way, the code examples above keep the logic of your original code, which looks questionable to me. Your isExcluded
method returns true
, if the specified path contains any of the elements in list, so it returns true
for /some/prefix/to/my/path/one
, as well as my/path/one/and/some/suffix
or even /some/prefix/to/my/path/one/and/some/suffix
.
Even dummy/path/onerous
is considered fulfilling the criteria as it contains
the string my/path/one
…
Upvotes: 39