dsg
dsg

Reputation: 13004

Efficiently finding all overlapping matches for a regular expression

This is a followup to All overlapping substrings matching a java regex.

Is there a way to make this code faster?

public static void allMatches(String text, String regex)
  {
    for (int i = 0; i < text.length(); ++i) {
      for (int j = i + 1; j <= text.length(); ++j) {
        String positionSpecificPattern = "((?<=^.{"+i+"})("+regex+")(?=.{"+(text.length() - j)+"}$))";
        Matcher m = Pattern.compile(positionSpecificPattern).matcher(text);

        if (m.find()) 
        {   
          System.out.println("Match found: \"" + (m.group()) + "\" at position [" + i + ", " + j + ")");
        }   
      }   
    }   
  }

Upvotes: 0

Views: 1881

Answers (1)

Alan Moore
Alan Moore

Reputation: 75232

In the other question you mentioned Matcher's region() method, but you weren't making full use of it. What makes it so valuable is that the anchors will match at the region's bounds as if they were the bounds of a standalone string. That's assuming you've got the useAnchoringBounds() option set, but that's the default setting.

public static void allMatches(String text, String regex)
{
  Matcher m = Pattern.compile(regex).matcher(text);
  int end = text.length();
  for (int i = 0; i < end; ++i)
  {
    for (int j = i + 1; j <= end; ++j) 
    {
      m.region(i, j);

      if (m.find()) 
      {   
        System.out.printf("Match found: \"%s\" at position [%d, %d)%n",
                          m.group(), i, j);
      }   
    }   
  }   
}

Given your sample string and regex:

allMatches("String t = 04/31 412-555-1235;", "^\\d\\d+$");

...I get this output:

Match found: "04" at position [11, 13)
Match found: "31" at position [14, 16)
Match found: "41" at position [17, 19)
Match found: "412" at position [17, 20)
Match found: "12" at position [18, 20)
Match found: "55" at position [21, 23)
Match found: "555" at position [21, 24)
Match found: "55" at position [22, 24)
Match found: "12" at position [25, 27)
Match found: "123" at position [25, 28)
Match found: "1235" at position [25, 29)
Match found: "23" at position [26, 28)
Match found: "235" at position [26, 29)
Match found: "35" at position [27, 29)

Upvotes: 2

Related Questions