Reputation: 6628
From my understanding of regular expressions, the string "00###" has to match with "[0-9]", but not with "^[0-9]$". But it doesn't work with Java regexp's.
After some investigating of this problem I found the following information (http://www.wellho.net/solutions/java-regular-expressions-in-java.html):
It might appear that Java regular expressions are default anchored with both a ^ and $ character.
Can we be sure that this is true for all versions of JDK? And can this mode be turned off (i.e. to disable default anchoring with ^ and $)?
Upvotes: 28
Views: 8261
Reputation: 838126
As the article you linked to explains, it depends on the function you call. If you want to add ^ and $ by default, use String#matches
or Matcher#matches
. If you don't want that, use the Matcher#find
method instead.
import java.util.regex.*;
public class Example
{
public static void main(String[] args)
{
System.out.println("Matches: " + "abc".matches("a+"));
Matcher matcher = Pattern.compile("a+").matcher("abc");
System.out.println("Find: " + matcher.find());
}
}
Output:
Matches: false
Find: true
Upvotes: 29
Reputation: 75222
Yes, matches()
always acts as if the regex were anchored at both ends. To get the traditional behavior, which is to match any substring of the target, you have to use find()
(as others have already pointed out). Very few regex tools offer anything equivalent to Java's matches()
methods, so your confusion is justified. The only other one I can think of offhand is the XML Schema flavor.
Upvotes: 7
Reputation: 15259
In addition to Mr. Byers's answer, note too that Matcher#find()
picks up where its last successful match left off. That only matters for repeated use of a Matcher
instance, but that's the feature that allows emulation of Perl's \G
assertion. It's also useful in concert with Matcher#usePattern(Pattern)
, where you use one pattern to find some prefix and then swap in a repeating pattern (including \G
) to loop over repeated matches with Matcher#find()
.
There's also Matcher#lookingAt()
, which is implicitly bounded at the beginning (^
) but not at the end. I prefer to think that name was inspired by the Emacs function looking-at
.
Upvotes: 4