DixonD
DixonD

Reputation: 6628

Is regex in Java anchored by default with both a ^ and $ character?

From my understanding of regular expressions, the string "00###" has to match with "[0-9]", but not with "^[0-9]$". But it doesn't work with Java regexp's.

After some investigating of this problem I found the following information (http://www.wellho.net/solutions/java-regular-expressions-in-java.html):

It might appear that Java regular expressions are default anchored with both a ^ and $ character.

Can we be sure that this is true for all versions of JDK? And can this mode be turned off (i.e. to disable default anchoring with ^ and $)?

Upvotes: 28

Views: 8261

Answers (3)

Mark Byers
Mark Byers

Reputation: 838126

As the article you linked to explains, it depends on the function you call. If you want to add ^ and $ by default, use String#matches or Matcher#matches. If you don't want that, use the Matcher#find method instead.

import java.util.regex.*;

public class Example
{
    public static void main(String[] args)
    {
        System.out.println("Matches: " + "abc".matches("a+"));

        Matcher matcher = Pattern.compile("a+").matcher("abc");
        System.out.println("Find: " + matcher.find());
    }
}

Output:

Matches: false
Find: true

Upvotes: 29

Alan Moore
Alan Moore

Reputation: 75222

Yes, matches() always acts as if the regex were anchored at both ends. To get the traditional behavior, which is to match any substring of the target, you have to use find() (as others have already pointed out). Very few regex tools offer anything equivalent to Java's matches() methods, so your confusion is justified. The only other one I can think of offhand is the XML Schema flavor.

Upvotes: 7

seh
seh

Reputation: 15259

In addition to Mr. Byers's answer, note too that Matcher#find() picks up where its last successful match left off. That only matters for repeated use of a Matcher instance, but that's the feature that allows emulation of Perl's \G assertion. It's also useful in concert with Matcher#usePattern(Pattern), where you use one pattern to find some prefix and then swap in a repeating pattern (including \G) to loop over repeated matches with Matcher#find().

There's also Matcher#lookingAt(), which is implicitly bounded at the beginning (^) but not at the end. I prefer to think that name was inspired by the Emacs function looking-at.

Upvotes: 4

Related Questions