Jaydeep
Jaydeep

Reputation: 21

How to get good perfomance of Regex in java

Below is example of text:

String id = "A:abc,X:def,F:xyz,A:jkl";

Below is regex:

Pattern p = Pattern.compile("(.*,)?[AC]:[^:]+$");
if(p.matcher(id).matches()) {
  System.out.println("Hello world!")
}

When executed above code should print Hello world!.

Does this regex can be modified to gain more performance?

Upvotes: 2

Views: 275

Answers (4)

Ed .
Ed .

Reputation: 6403

Do you even need to use regualr expressions? It seems there isn't a huge variety in what you are testing.

If you need to use the regex as others have said, compiling it only once makes sense and if you only need to check the last token maybe you could simplify the regex to: [AC]:[^:]{3}$.

Could you possibly use something along these lines (untested...)?

private boolean isId(String id)
    {
        char[] chars = id.toCharArray();
        boolean valid = false;
        int length = chars.length;

        if (length >= 5 && chars[length - 4] == ':')
        {
            char fifthToLast = chars[length - 5];

            if (fifthToLast == 'A' || fifthToLast == 'C')
            {
                valid = true;

                for (int i = length - 1; i >= length - 4; i--)
                {
                    if (chars[i] == ':')
                    {
                        valid = false;
                        break;
                    }
                }
            }
        }

        return valid;
    }

Upvotes: 0

Esko
Esko

Reputation: 29367

Move Pattern instantiation to a final static field (erm, constant), in your current code you're recompiling essentially the same Pattern every single time (no, Pattern doesn't cache anything!). That should give you some noticeable performance boost right off the bat.

Upvotes: 1

jitter
jitter

Reputation: 54605

Pattern p = Pattern.compile(".*[AC]:[^:]+$");
if(p.matcher(id).matches()) {
  System.out.println("Hello world!")
}

As you seem to only be interested if it the string ends in A or C followed by a colon and some characters which aren't colons you can just use .* instead of (.*,)? (or do you really want to capture the stuff before the last piece?)

If the stuff after the colon is all lower case you could even do

Pattern p = Pattern.compile(".*[AC]:[a-z]+$");

And if you are going to match this multiple times in a row (e.g. loop) be sure to compile the pattern outside of the loop.

e,g

Pattern p = Pattern.compile(".*[AC]:[a-z]+$");
Matcher m = p.matcher(id);
while(....) {
  ...
  // m.matches()
  ...
  // prepare for next loop m.reset(newvaluetocheck);
}

Upvotes: 3

JasCav
JasCav

Reputation: 34632

As I can't see your entire code, I can only assume that you do the pattern compilation inside your loop/method/etc. One thing that can improve performance is to compile at the class level and not recompile the pattern each time. Other than that, I don't see much else that you could change.

Upvotes: 8

Related Questions