Reputation: 837
What is the best and the fastest way to check if a string matches something like:"AB+SPACE+NUMBER+ANYTHING
eg: "AB 1234 DEFG"
I am looking for the most efficient way, as it is going to compare thousands of transaction per minute.
Upvotes: 1
Views: 731
Reputation: 65811
To extract the maximum throughput you may just possibly be able to beat a pre-compiled regex Pattern
with a state machine - so long as you can code the pattern and it does not change.
enum Match {
Start {
@Override
Match match(char c) {
return A.match(c);
}
},
A {
@Override
Match match(char c) {
return c == 'A' ? B : Fail;
}
},
B {
@Override
Match match(char c) {
return c == 'B' ? Space : Fail;
}
},
Space {
@Override
Match match(char c) {
return c == ' ' ? Number : Fail;
}
},
Number {
@Override
Match match(char c) {
return Character.isDigit(c) ? Number : Anything;
}
},
Anything {
@Override
Match match(char c) {
return Anything;
}
},
Fail {
@Override
Match match(char c) {
return Fail;
}
};
abstract Match match(char c);
static boolean matches(String s) {
Match state = Start;
for (int i = 0; i < s.length() && state != Fail; i++) {
state = state.match(s.charAt(i));
}
return state != Fail;
}
}
public void test() {
List<String> tests = Arrays.asList("AB 123Hello", "ABC 123Hello", "AB Hello", "AB 0 Hello");
for (String s : tests) {
boolean match = Match.matches(s);
System.out.println(s + " - " + (match ? "Matches" : "Fails"));
}
}
Actually it is possible to build these state machines on the fly - which is sort of what Pattern.compile
does - but growing your own and hard-coding like this can sometimes achieve faster throughput.
Please be sure to test your state machine against a standard Java Pattern
before use as it could easily happen that the Pattern
is actually a little faster.
Note that as coded here the number can have zero digits. If that is acceptable then you could skip that state completely and go straight to Anything
from Space
. If not then add a FirstDigit
state to make sure at least one digit is present. You could even go straight to Anything
after FirstDigit
.
Upvotes: 1
Reputation: 124235
Avoid using yourString.matches(regex)
as it needs to recompile regex each time you use matches
which can be expensive operation.
It is better to compile regex once and reuse it when needed like
Pattern p = Pattern.compile(regex);//compile once
for (String data : dataCollection){
Matcher m = p.matcher(data);
//... if (m.foo(bar))//some condition
}
Another reason to avoid matches
is that it can return true, if entire data will match regex, which means it will need to traverse entire string even if you could make your decision when immediately after seeing AB 1234
part at start of your data.
So instead of checking if string is in form AB(space)digits(restOfData)
which would require us to traverse over restOfData
part we can check if we can find
AB(space)digits
part at start of the string. Such regex could look like ^AB\\s\\d+
^
- anchor representing start of the stringAB
- literals representing AB text\\s
- one of whitespaces characters\\d+
- digit ("\\d"
) which can appear once or more (+
)So your code can look like
private static final Pattern p = Pattern.compile("^AB\\s\\d+");
private static boolean validate(String data){
return p.matcher(data).find();
}
Upvotes: 2
Reputation: 440
Notice: I really suck at making efficient code, unless I use scala, then it's different, but right now my scala IDE isn't working, so you just get the java.
Java: (again, probobly not very efficient)
public static boolean checkString(String original) {
int space1 = original.indexOf(" ");
String section1 = original.substring(0, space1);
String sections2Onwards = original.substring(space1+1, original.length());
int space2 = sections2Onwards.indexOf(" ");
String section2 = sections2Onwards.substring(0,space2);
String end = original.substring(space2+1, original.length());
//Now on to the good, fun part, making sure that it is in fact the right pattern
//This checks that every character in section1 is not a number
char[] section1Split = section1.toCharArray();
for (char c : section1Split) {
try {
new Integer(new String(new char[] {c}));
return false;
} catch(NumberFormatException n) {}
}
//Now check that section2 is a number
try {
new Integer(section2);
} catch(NumberFormatException n) {
return false;
}
//Making sure that there are no spaces in "anything"
//Ignore this if "anything" can include spaces
if(end.indexOf(" ") > -1) {
return false;
}
//Since all conditions are true, return true! This string is legit!
return true;
}
Avinash's answer is probobly better and faster, but I (and quite a few other people on stack overflow) can't understand how to make those patterns.
Hope I helped a bit!
Upvotes: -1
Reputation: 174706
Use string.matches
function which uses regex for string matching.
string.matches("AB \\d+.*");
^ ^ ^ ^
| | | |_ Anything
AB | Num
|
space
If you want space after number, then use
"AB \\d+ .+"
Upvotes: 2