Shitu
Shitu

Reputation: 837

Efficicient way for Java pattern Matching

What is the best and the fastest way to check if a string matches something like:"AB+SPACE+NUMBER+ANYTHING eg: "AB 1234 DEFG" I am looking for the most efficient way, as it is going to compare thousands of transaction per minute.

Upvotes: 1

Views: 731

Answers (4)

OldCurmudgeon
OldCurmudgeon

Reputation: 65811

To extract the maximum throughput you may just possibly be able to beat a pre-compiled regex Pattern with a state machine - so long as you can code the pattern and it does not change.

enum Match {

    Start {

                @Override
                Match match(char c) {
                    return A.match(c);
                }

            },
    A {

                @Override
                Match match(char c) {
                    return c == 'A' ? B : Fail;
                }

            },
    B {

                @Override
                Match match(char c) {
                    return c == 'B' ? Space : Fail;
                }

            },
    Space {

                @Override
                Match match(char c) {
                    return c == ' ' ? Number : Fail;
                }

            },
    Number {

                @Override
                Match match(char c) {
                    return Character.isDigit(c) ? Number : Anything;
                }

            },
    Anything {

                @Override
                Match match(char c) {
                    return Anything;
                }

            },
    Fail {

                @Override
                Match match(char c) {
                    return Fail;
                }

            };

    abstract Match match(char c);

    static boolean matches(String s) {
        Match state = Start;
        for (int i = 0; i < s.length() && state != Fail; i++) {
            state = state.match(s.charAt(i));
        }
        return state != Fail;
    }
}

public void test() {
    List<String> tests = Arrays.asList("AB 123Hello", "ABC 123Hello", "AB Hello", "AB 0 Hello");
    for (String s : tests) {
        boolean match = Match.matches(s);
        System.out.println(s + " - " + (match ? "Matches" : "Fails"));

    }
}

Actually it is possible to build these state machines on the fly - which is sort of what Pattern.compile does - but growing your own and hard-coding like this can sometimes achieve faster throughput.

Please be sure to test your state machine against a standard Java Pattern before use as it could easily happen that the Pattern is actually a little faster.

Note that as coded here the number can have zero digits. If that is acceptable then you could skip that state completely and go straight to Anything from Space. If not then add a FirstDigit state to make sure at least one digit is present. You could even go straight to Anything after FirstDigit.

Upvotes: 1

Pshemo
Pshemo

Reputation: 124235

Avoid using yourString.matches(regex) as it needs to recompile regex each time you use matches which can be expensive operation.
It is better to compile regex once and reuse it when needed like

Pattern p = Pattern.compile(regex);//compile once
for (String data : dataCollection){
    Matcher m = p.matcher(data);
    //... if (m.foo(bar))//some condition

}

Another reason to avoid matches is that it can return true, if entire data will match regex, which means it will need to traverse entire string even if you could make your decision when immediately after seeing AB 1234 part at start of your data.

So instead of checking if string is in form AB(space)digits(restOfData) which would require us to traverse over restOfData part we can check if we can find AB(space)digits part at start of the string. Such regex could look like ^AB\\s\\d+

  • ^ - anchor representing start of the string
  • AB - literals representing AB text
  • \\s - one of whitespaces characters
  • \\d+ - digit ("\\d") which can appear once or more (+)

So your code can look like

private static final Pattern p = Pattern.compile("^AB\\s\\d+");
private static boolean validate(String data){
    return p.matcher(data).find();
}

Upvotes: 2

Mrab Ezreb
Mrab Ezreb

Reputation: 440

Notice: I really suck at making efficient code, unless I use scala, then it's different, but right now my scala IDE isn't working, so you just get the java.
Java: (again, probobly not very efficient)

public static boolean checkString(String original) {
    int space1 = original.indexOf(" ");
    String section1 = original.substring(0, space1);
    String sections2Onwards = original.substring(space1+1, original.length());
    int space2 = sections2Onwards.indexOf(" ");
    String section2 = sections2Onwards.substring(0,space2);
    String end = original.substring(space2+1, original.length());
    //Now on to the good, fun part, making sure that it is in fact the right pattern
    //This checks that every character in section1 is not a number
    char[] section1Split = section1.toCharArray();
    for (char c : section1Split) {
        try {
            new Integer(new String(new char[] {c}));
            return false;
        } catch(NumberFormatException n) {}
    }
    //Now check that section2 is a number
    try {
        new Integer(section2);
    } catch(NumberFormatException n) {
        return false;
    }
    //Making sure that there are no spaces in "anything"
    //Ignore this if "anything" can include spaces
    if(end.indexOf(" ") > -1) {
        return false;
    }
    //Since all conditions are true, return true! This string is legit!
    return true;
}

Avinash's answer is probobly better and faster, but I (and quite a few other people on stack overflow) can't understand how to make those patterns.

Hope I helped a bit!

Upvotes: -1

Avinash Raj
Avinash Raj

Reputation: 174706

Use string.matches function which uses regex for string matching.

string.matches("AB \\d+.*");
                ^ ^  ^ ^
                | |  | |_ Anything
               AB | Num
                  |
                space

If you want space after number, then use

"AB \\d+ .+"

Upvotes: 2

Related Questions