Reputation: 3500
In a regex OR, When there are multiple inputs with a common prefix, The regex will match the first input in Regex OR
instead of longest match.
For example, for the regular expression regex = (KA|KARNATAKA)
and input = KARNATAKA
the output will be 2 matches match1 =KA
and match2 = KA
.
But what I want is complete longest possible match out of given input in Regex OR
which is match1 = KARNATAKA
in my given example.
Here is the example in a regex client
So what I am doing right now is, I am sorting the input in Regex OR
by length in descending order.
My question is, Can we specify in the regex itself to match the longest possible String? Or is sorting the only way to do it?
I have already refered this question and I don't see a solution other than sorting
Upvotes: 1
Views: 779
Reputation: 8363
You can create a helper method for this:
public final class PatternHelper {
public static Pattern compileSortedOr(String regex) {
Matcher matcher = Pattern.compile("(.*)\\((.*\\|.*)\\)(.*)").matcher(regex);
if (matcher.matches()) {
List<String> conditions = Arrays.asList(matcher.group(2).split("\\|"));
List<String> sortedConditions = conditions.stream()
.sorted((c1, c2) -> c2.length() - c1.length())
.collect(Collectors.toList());
return Pattern.compile(matcher.group(1) +
"(" +
String.join("|", sortedConditions) +
")" +
matcher.group(3));
}
return Pattern.compile(regex);
}
}
Matcher matcher = PatternHelper.compileSortedOr("(KA|KARNATAKA)").matcher("KARNATAKA");
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
Output:
KARNATAKA
P.S. This only works for simple expressions without nested brackets. You would need to tweak if you are expecting much complex expressions.
Upvotes: 0
Reputation: 2436
You can use word boundary (\b
) to avoid matching prefixes
For the case you mentioned: the following regex will only match KA
or KARNATAKA
(\bKA\b|\bKARNATAKA\b)
Upvotes: 1