Victor
Victor

Reputation: 17107

Java regular expression offers any performance benefit?

In Java, when we try to do pattern matching using a regular expression. e.g. take a input string and use regular expression to find out if it is numeric. If not, throw an exception. In this case, I understand, using regex makes the code less verbose than if we were to take each character of the string, check if it is a number and if not throw an exception.

But I was under the assumption that regex also makes the process more efficient. IS this true? I cannot find any evidence on this point. How is regex doing the match behind the scenes? IS it not also iterating over the string and checking each character one by one?

Upvotes: 9

Views: 2439

Answers (8)

Mark Bramnik
Mark Bramnik

Reputation: 42551

Just my 5 cents :) In general Regular expressions language is not intended to only parse integers or strings, Its quite a powerful tool that allows to recognize any 'regular expression'. It reminds me my university time (Remember Automatas Theory course ? :) , but here is the link that describes what the regular language really is

Now Since it builds FSMs it introduces some overhead, so maybe for Integer.parseInt regular expression engine is not a good substitution, moreover Java introduced the more specific API. However regular expressions have a benefit when working with more complex expressions and when we have a lot of them.

The Regular expression must be used wisely. The pattern must be always compiled (Otherwise it can't be reused efficiently, since compiling the pattern each time will drain the performance)

I would suggest to run the test on more complex input and see what happens.

Upvotes: 1

jahroy
jahroy

Reputation: 22692

I don't see how it could get any simpler or easier to read than:

Integer.parseInt()

or

Double.parseDouble()

They do exactly what you describe including throw an Exception for invalid input.

Regarding performance: I would expect a regex to be less efficient than the above.

Upvotes: 1

user1154664
user1154664

Reputation: 1392

About regex behind the scenes...

A finite-state machine (FSM) is equivalent to a Regular Expression. FSM is a machine that can recognize a language (in your case numbers). FSM has an alphabet, states, an initial state, N-final states and transition functions from one state to another. The string needs to be contain in the alphabet(ASCII for example). The FSM begins at the initial state. When you input a string it process char by char moving from state to state depending on a function(state, char) => state. When it reaches a final state you know if you string is a numeric or not.

For more, see FSM and see Automata-based_programming

Upvotes: 1

deleted_user
deleted_user

Reputation: 3805

To answer your question specifically:

Why dont you apply a regex pattern match over some complex text, and then attempt to write the same matching code yourself.

See which is faster.

Answer: The regex.

Upvotes: 0

assylias
assylias

Reputation: 328923

Just for fun, I have run this micro benchmark. The results of the last run (i.e. post JVM warm up / JIT) are below (results are fairly consistent from one run to another anyway):

regex with numbers 123
chars with numbers 33
parseInt with numbers 33
regex with words 123
chars with words 34
parseInt with words 733

In other words, chars is very efficient, Integer.parseInt is as efficient as char IF the string is a number, but awfully slow if the string is not a number. Regex is in between.

Conclusion

If you parse a string into a number and you expect the string to be a number in general, using Integer.parseInt is the best solution (efficient and readable). The penalty you get when the string is not a number should be low if it is not too frequent.

ps: my regex is maybe not optimal, feel free to comment.

public class TestNumber {

    private final static List<String> numbers = new ArrayList<>();
    private final static List<String> words = new ArrayList<>();

    public static void main(String args[]) {
        long start, end;
        Random random = new Random();

        for (int i = 0; i < 1000000; i++) {
            numbers.add(String.valueOf(i));
            words.add(String.valueOf(i) + "x");
        }

        for (int i = 0; i < 5; i++) {
            start = System.nanoTime();
            regex(numbers);
            System.out.println("regex with numbers " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            chars(numbers);
            System.out.println("chars with numbers " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            exception(numbers);
            System.out.println("exceptions with numbers " + (System.nanoTime() - start) / 1000000);

            start = System.nanoTime();
            regex(words);
            System.out.println("regex with words " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            chars(words);
            System.out.println("chars with words " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            exception(words);
            System.out.println("exceptions with words " + (System.nanoTime() - start) / 1000000);
        }
    }

    private static int regex(List<String> list) {
        int sum = 0;
        Pattern p = Pattern.compile("[0-9]+");
        for (String s : list) {
            sum += (p.matcher(s).matches() ? 1 : 0);
        }
        return sum;
    }

    private static int chars(List<String> list) {
        int sum = 0;

        for (String s : list) {
            boolean isNumber = true;
            for (char c : s.toCharArray()) {
                if (c < '0' || c > '9') {
                    isNumber = false;
                    break;
                }
            }
            if (isNumber) {
                sum++;
            }
        }
        return sum;
    }

    private static int exception(List<String> list) {
        int sum = 0;

        for (String s : list) {
            try {
                Integer.parseInt(s);
                sum++;
            } catch (NumberFormatException e) {
            }
        }
        return sum;
    }
}

Upvotes: 4

vbezhenar
vbezhenar

Reputation: 12366

In the end it is indeed iterating over the string and checking each character trying to find match for the provided pattern. Moreover it uses backtracking (if there are many ways that could possibly match, engine will try them all), which could result in very poor performance for some unusual cases (not likely that you'll encounter this, but theoretically possible). In worst case performance of java regular expression engine is O(2N), where N is length of input string.

There are algorithms for much faster pattern matching delivering O(N) performance but with less features comparing to Java regular expressions.

Here is an article discussing this question in details.

But in most cases the regular expression engine won't be the performance bottleneck in your application. It's fast enough, so generally don't worry about it unless your profiler points to it. And it provides a declarative description of the algorithm which is very useful because almost always iterative algorithm implementation will be much more verbose and much less readable.

Upvotes: 0

Denis Bazhenov
Denis Bazhenov

Reputation: 9965

Well, it hard to say for sure, but in general case regular expressions are less likely to be more efficient compared to explicit character checking. RE is final state automata, so there is some overhead on automata building and maintaining. In my practice explicit code is always faster (and thus more efficient) than regular expressions.

But here is the dilemma. Regular expressions are almost always more efficient from time-to-deliver point of view and more readable when used correctly . And here is another dilemma. I so rarely see correct usage of regular expressions...

In your scenario I suggest to use guava library:

boolean isValid = DIGIT.matchesAllOf("1234");

Upvotes: 0

Michael
Michael

Reputation: 1012

I don't have a technical answer yet, but I could write some code and see. I don't think that regular expressions would be the way to go for converting a string to a number. In many instances they can be more efficient, but if its written poorly it'll be slow.

May I ask however, why aren't you just using: Integer.parseInt("124")? That will throw a NumberFormatException. Should be able to handle it, and it leaves the detection of a number up to core Java.

Upvotes: 3

Related Questions