Is this normal Java regex behavior?

Question

I found this in some code I wanted to optimize. Here is the snipet:

tempString = bigBuffer.replaceAll("\n", "");
tempString = tempString.replaceAll("\t", "");

Then I decided to use the regex wisely and I did this:

tempString = bigBuffer.replaceAll("[\n\t]", "");

Then a friend told me to do this instead:

tempString = bigBuffer.replaceAll("\n|\t", "");

Since I like to know the result of my changes I did a test to verify if it was a good optimization. So, the result with (java version "1.6.0_27") is with the first code being the reference 100%.

With the pipe it is 121% so it took more time to perform the task.

With the square bracket it is 52% so it took less time to perform the task.

Why does the regex behave differently where it should be the same?

Martin

nrobey · Accepted Answer

The first code snippet looks through bigBuffer twice, the first time replacing the new lines, and the second time replaces the tabs.

The second code snippet would search through bigBuffer only once, checking to see if each character is one or the other. This would result in the speed finishing in only half the time.

The code snippet in the third place is probably poorly compiled, and results in a particularly bad version of the first code's algorithm, though I could not say for sure without examining the path through the regex compilation carefully.

Excellent work on the testing though. Relative timing (percent-based) is useful, absolute timing (millisecond or some such) is not.

Is this normal Java regex behavior?

Answers (2)

Related Questions