Reputation: 2658
I have pattern "abc|de.|ghi" and an input of "def". How do I find the part of the pattern that matched "def"? So, in this example the result I want is "de.". Not "def", which is all I have been able to get using group and start/end methods.
In code:
Pattern p = Pattern.compile("abc|de.|ghi");
Matcher m = p.matcher("def");
if(matcher.find()) {
// Here I want to get the String "de." somehow
}
Upvotes: 3
Views: 175
Reputation: 34873
There’s no easy way of doing this with Pattern
in the standard library.
The source code for Pattern
uses a recursive-descent parser to create a tree of Node
objects which each support a match()
method. For example, to evaluate |
, there is a Branch
subclass at line 4107 that stores a list of possible alternatives. Its match()
method tries each alternative, and returns true
if any one of the alternatives matches and the successor node matches. Otherwise it returns false
.
Groups are saved by inserting special GroupHead
and GroupTail
nodes into the parse tree, which save the start and end position of each group into private variables of the Pattern
class.
To find out which parts of the pattern caused the match, the node objects would need to know the parts of the pattern that caused them to be created. The parser simply doesn’t store that information in the nodes it creates. The original pattern is stored in a temp
array, and the recursive-descent parser keeps a cursor
index into the temp
array as it parses. The parser helper methods like peek()
and accept()
, which are defined starting at line 1567, simply increment cursor
as needed. When nodes are created, the value of cursor
simply isn’t stored anywhere. But that value is necessary to reconstruct which parts of the pattern corresponds to the match.
It’s understandable why Pattern
doesn’t save this information: it would slow down all regular expression evaluations, but the additional functionality would hardly ever be used.
One possibility is to create a modified version of Pattern
that does the appropriate bookkeeping to trace matches back to the parts of the patterns that they came from. For storing which part of the pattern each node corresponds to, you may be able to get away with making the Node()
constructor stash a copy of the cursor
field. But for using that data to find which parts of the pattern matched, you’ll need to update every Node
subclass’s match()
method to store the range based on the semantics of each subclass of Node
….
Good luck!
Upvotes: 1
Reputation: 32827
So,you want to return the regex which matched the input..i guess theres no such method in Pattern or Matcher class to return that exact pattern separated by or
So,you could do it this way
public static String getMatchedRegexPattern(String inputRegex,String input) throws Exception
{
if(Pattern.compile("(?<!\\\\)([\\(\\)\\[\\]])").matcher(inputRegex).find())throw new Exception("Groups,brackets not supported");
for(String regex:inputRegex.split("(?<!\\\\)\\|"))//split only if | is not escaped
{
if(Pattern.compile(regex).matcher(input).matches())
return regex;
}
return "";
}
You could call it as
getMatchedRegexPattern("abc|de.|ghi","def");
Upvotes: 1
Reputation: 16518
It's kind of round about, but you can do what you want with capture groups:
Pattern p = Pattern.compile("(abc)|(de.)|(ghi)");
Matcher m = p.matcher("def");
if(m.find()) {
if (m.group(1) != null)
System.out.println("Matched \"abc\"");
if (m.group(2) != null)
System.out.println("Matched \"de.\"");
if (m.group(3) != null)
System.out.println("Matched \"ghi\"");
}
Upvotes: 3