Reputation: 1479
I want to check if every line of a file match with multiple regex patterns.
Example: test this line of my text file
123;456;789
against 3 different expressions
1.*;.*;..9
3.*;.*;787
.2.;.*;..9
and do something when a patterns match or do not match each.
So I need to know which one of all my patterns matches or not
in this example :
only P1 and P3 matches so I execute action 1 and action 3 on input 123;456;789
The naive solution with nested for loop gives poor performances (because of the algorithm).
example:
for(String row : rows){
for (Pattern p : patterns){
if(p.matcher(value).matches()){
//
}
}
}
I was thinking about inlining multiple regexes with a "|" operator
using the above example: (1.*;.*;..9)|(3.*;.*;787)|(.2.;.*;..9)
String expression = "(1.*;.*;..9)|(3.*;.*;787)|(.2.;.*;..9)";
String value = "123;456;789";
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(value);
HashMap<Integer,Boolean> results= new HashMap<>();
if(matcher.matches()) {
int count = matcher.groupCount();
for (int i = 1; i <= count; ++i) {
results.put(i, matcher.group(i) != null);
}
}
But the engine stops at the first matching alternative
Is there a way to test multiple different patterns in a single call? Else how can I improve the algorithm without being quadratic
Upvotes: 1
Views: 118
Reputation: 48711
That's the right behavior of a regex engine to stop where a successful match is found. To simulate what you are trying to do you should work with lookaheads but in a way they do not interrupt the match (fail soon or success soon). So something like the following regex will try to match three different capturing groups. If one regex inside capturing groups fails to match, since it is optional, the other lookahead is tried and this goes till end:
^(?=(1.*;.*;..9$)?)(?=(3.*;.*;787$)?)(?=(.2.;.*;..9$)?)
You only need to work with capturing groups later to execute some codes if a group is captured:
if (capturingGroup == 1) {
// do something
} else if (capturingGroup == 2) {
...
See live demo here (in here two of your regexes are matched and recognizable)
Note: You may want to remove dot-stars in favor of a more restrictive pattern. Currently it matches so much.
Note: Since two of regular expressions here won't match at the same time you may change the above regex to:
^(?:(?=(1.*;.*;..9$)?)(?=(.2.;.*;..9$)?)|(3.*;.*;787)$)
Upvotes: 1
Reputation: 3600
The engine stops after first match because the input string is consumed by the first pattern and the next one doesn't have any input to match against with. You can use non-consuming positive look-ahead(?=)
here to have the whole expression in one go
(?=(1.*;.*;..9))(3.*;.*;787)|(.2.;.*;..9)
Above is a sample case where it will match the first expression but will not consume any character for the input String, so the next regex pattern(here (3.*;.*;787))
have the whole input string as input. You can use this concept to create your desired regex pattern
Upvotes: 0