Reputation: 324
This is probably an incredibly simple question, as well as likely a duplicate (although I did try to check beforehand), but which is less expensive when used in a loop, String.replaceAll()
or matcher.replaceAll()
?
While I was told
Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
matcher = regexPattern.matcher(Scanner.next());
thisWord = matcher.replaceAll("");
...
}
is better, because you only have to compile the regex once, I would think that the benefits of
String thisWord;
while (Scanner.hasNext()) {
thisWord = Scanner.next().replaceAll("[^a-zA-Z0-9]","");
...
}
far outweigh the matcher
method, due to not having to initialize the matcher
every time. (I understand the matcher
exists already, so you are not recreating it.)
Can someone please explain how my reasoning is false? Am I misunderstanding what Pattern.matcher()
does?
Upvotes: 0
Views: 1471
Reputation: 15126
There is a more efficient way if you reset the same matcher, then it is not regenerated on each occasion inside the loop which makes a copy of most of the same information relating to the Pattern structure.
Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher = regexPattern.matcher("");
String thisWord;
while (Scanner.hasNext()) {
matcher = matcher.reset(Scanner.next());
thisWord = matcher.replaceAll("");
// ...
}
There is a one-off cost to create the matcher outside the loop regexPattern.matcher("")
but the calls to matcher.reset(xxx)
will be quicker because they re-use that matcher rather than re-generating a new matcher instance each time. This reduces the amount of GC required.
Upvotes: 0
Reputation: 183270
In OpenJDK, String.replaceAll is defined as follows:
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
So at least with that implementation, it won't give better performance than compiling the pattern only once and using Matcher.replaceAll.
It's possible that there are other JDK implementations where String.replaceAll is implemented differently, but I'd be very surprised if there were any where it performed better than Matcher.replaceAll.
[…] due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)
I think you have a misunderstanding here. You really do create a new Matcher instance on each loop iteration; but that is very cheap, and not something to be concerned about performance-wise.
Incidentally, you don't actually need a separate 'matcher' variable if you don't want one; you'll get exactly the same behavior and performance if you write:
thisWord = regexPattern.matcher(Scanner.next()).replaceAll("");
Upvotes: 1