MiniScalope
MiniScalope

Reputation: 1479

Regex non greedy alternations

I want to check if every line of a file match with multiple regex patterns.

Example: test this line of my text file

123;456;789

against 3 different expressions

1.*;.*;..9
3.*;.*;787
.2.;.*;..9

and do something when a patterns match or do not match each. So I need to know which one of all my patterns matches or not in this example : only P1 and P3 matches so I execute action 1 and action 3 on input 123;456;789

The naive solution with nested for loop gives poor performances (because of the algorithm).

example:

for(String row : rows){
   for (Pattern p : patterns){
     if(p.matcher(value).matches()){
       //
     }
   }
}

I was thinking about inlining multiple regexes with a "|" operator

using the above example: (1.*;.*;..9)|(3.*;.*;787)|(.2.;.*;..9)

String expression = "(1.*;.*;..9)|(3.*;.*;787)|(.2.;.*;..9)";
String value = "123;456;789";
Pattern  pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(value);

HashMap<Integer,Boolean> results= new HashMap<>();
if(matcher.matches()) {
    int count = matcher.groupCount();
    for (int i = 1; i <= count; ++i) {
        results.put(i, matcher.group(i) != null);
    }
}

But the engine stops at the first matching alternative

Is there a way to test multiple different patterns in a single call? Else how can I improve the algorithm without being quadratic

Upvotes: 1

Views: 118

Answers (2)

revo
revo

Reputation: 48711

That's the right behavior of a regex engine to stop where a successful match is found. To simulate what you are trying to do you should work with lookaheads but in a way they do not interrupt the match (fail soon or success soon). So something like the following regex will try to match three different capturing groups. If one regex inside capturing groups fails to match, since it is optional, the other lookahead is tried and this goes till end:

^(?=(1.*;.*;..9$)?)(?=(3.*;.*;787$)?)(?=(.2.;.*;..9$)?)

You only need to work with capturing groups later to execute some codes if a group is captured:

if (capturingGroup == 1) {
    // do something
} else if (capturingGroup == 2) {
...

See live demo here (in here two of your regexes are matched and recognizable)

Note: You may want to remove dot-stars in favor of a more restrictive pattern. Currently it matches so much.

Note: Since two of regular expressions here won't match at the same time you may change the above regex to:

^(?:(?=(1.*;.*;..9$)?)(?=(.2.;.*;..9$)?)|(3.*;.*;787)$)

Upvotes: 1

Ashishkumar Singh
Ashishkumar Singh

Reputation: 3600

The engine stops after first match because the input string is consumed by the first pattern and the next one doesn't have any input to match against with. You can use non-consuming positive look-ahead(?=) here to have the whole expression in one go

(?=(1.*;.*;..9))(3.*;.*;787)|(.2.;.*;..9)

Above is a sample case where it will match the first expression but will not consume any character for the input String, so the next regex pattern(here (3.*;.*;787)) have the whole input string as input. You can use this concept to create your desired regex pattern

Upvotes: 0

Related Questions