gjrwebber
gjrwebber

Reputation: 2658

How to find the part of the regexp pattern that matched the input in java?

I have pattern "abc|de.|ghi" and an input of "def". How do I find the part of the pattern that matched "def"? So, in this example the result I want is "de.". Not "def", which is all I have been able to get using group and start/end methods.

In code:

Pattern p = Pattern.compile("abc|de.|ghi");
Matcher m = p.matcher("def");
if(matcher.find()) {
    // Here I want to get the String "de." somehow
}

Upvotes: 3

Views: 175

Answers (3)

andrewdotn
andrewdotn

Reputation: 34873

There’s no easy way of doing this with Pattern in the standard library.

The source code for Pattern uses a recursive-descent parser to create a tree of Node objects which each support a match() method. For example, to evaluate |, there is a Branch subclass at line 4107 that stores a list of possible alternatives. Its match() method tries each alternative, and returns true if any one of the alternatives matches and the successor node matches. Otherwise it returns false.

Groups are saved by inserting special GroupHead and GroupTail nodes into the parse tree, which save the start and end position of each group into private variables of the Pattern class.

To find out which parts of the pattern caused the match, the node objects would need to know the parts of the pattern that caused them to be created. The parser simply doesn’t store that information in the nodes it creates. The original pattern is stored in a temp array, and the recursive-descent parser keeps a cursor index into the temp array as it parses. The parser helper methods like peek() and accept(), which are defined starting at line 1567, simply increment cursor as needed. When nodes are created, the value of cursor simply isn’t stored anywhere. But that value is necessary to reconstruct which parts of the pattern corresponds to the match.

It’s understandable why Pattern doesn’t save this information: it would slow down all regular expression evaluations, but the additional functionality would hardly ever be used.

One possibility is to create a modified version of Pattern that does the appropriate bookkeeping to trace matches back to the parts of the patterns that they came from. For storing which part of the pattern each node corresponds to, you may be able to get away with making the Node() constructor stash a copy of the cursor field. But for using that data to find which parts of the pattern matched, you’ll need to update every Node subclass’s match() method to store the range based on the semantics of each subclass of Node….

Good luck!

Upvotes: 1

Anirudha
Anirudha

Reputation: 32827

So,you want to return the regex which matched the input..i guess theres no such method in Pattern or Matcher class to return that exact pattern separated by or

So,you could do it this way

public static String getMatchedRegexPattern(String inputRegex,String input) throws Exception
{
    if(Pattern.compile("(?<!\\\\)([\\(\\)\\[\\]])").matcher(inputRegex).find())throw new Exception("Groups,brackets not supported");
    for(String regex:inputRegex.split("(?<!\\\\)\\|"))//split only if | is not escaped
    {
    if(Pattern.compile(regex).matcher(input).matches())
        return regex;
    }
    return "";
}

You could call it as

getMatchedRegexPattern("abc|de.|ghi","def");

Upvotes: 1

Devon_C_Miller
Devon_C_Miller

Reputation: 16518

It's kind of round about, but you can do what you want with capture groups:

Pattern p = Pattern.compile("(abc)|(de.)|(ghi)");
Matcher m = p.matcher("def");
if(m.find()) {
    if (m.group(1) != null)
        System.out.println("Matched \"abc\"");
    if (m.group(2) != null)
        System.out.println("Matched \"de.\"");
    if (m.group(3) != null)
        System.out.println("Matched \"ghi\"");
}

Upvotes: 3

Related Questions