Maarten Bodewes
Maarten Bodewes

Reputation: 93968

String#contains using Pattern

If I would want to make a 100% clone of String#contains(CharSequence s): boolean in Java regex using Pattern. Would the following calls be identical?

input.contains(s);

and

Pattern.compile(".*" + Pattern.quote(s) + ".*").matcher(input).matches();

Similarly, would the following code have the same functionality?

Pattern.compile(Pattern.quote(s)).matcher(input).find();

I presume that the regex search is less performant, but only by a constant factor. Is this correct? Is there any way to optimize the regular expressions to mimic contains?


The reason that I'm asking is that I have a piece of code that is written around Pattern and it seems wasteful to create a separate piece of code that uses contains. On the other hand, I don't want different test results - even minor ones - for each code. Are there any Unicode related differences, for instance?

Upvotes: 1

Views: 434

Answers (3)

Maarten Bodewes
Maarten Bodewes

Reputation: 93968

This just to share how I decided to solve this little conundrum. I've redesigned by library to not take a Pattern but to take a predicate, like this:

public static Set<String> findAll() {
    return find(input -> true);
}

public static Set<String> findSubstring(String s) {
    return find(input -> input.contains(s));
}

public static Set<String> findPattern(Pattern p) {
    return find(p.asPredicate());
}

public static Set<String> findCaseInsensitiveSubstring(String s) {
    return find(Pattern.compile(Pattern.quote(s), Pattern.CASE_INSENSITIVE).asPredicate());
}

private static Set<String> find(Predicate<String> matcher) {
    var testInput = Set.of("some", "text", "to", "test");
    return testInput.stream().filter(matcher).collect(Collectors.toSet());
}

public static void main(String[] args) {
    System.out.println(findAll());
    System.out.println(findSubstring("t"));
    System.out.println(findPattern(Pattern.compile("^[^s]")));
    System.out.println(findCaseInsensitiveSubstring("T"));
}

where I've used all the comments and answers given up to now.

Note that there is also Pattern#asMatchPredicate() in case matching is required instead, e.g. for a function matchPattern.

Of course above is just a demonstration, not the actual functions in my solution.

Upvotes: 1

Harm van der Wal
Harm van der Wal

Reputation: 102

There are 2 ways to see if a String matches a Pattern:

return Pattern.compile(Pattern.quote(s)).asPredicate().test(input);

or

return Pattern.compile(Pattern.quote(s)).matcher.find(input);

There is no need for matching on .*. this will match anything surrounding the actual result and just be overhead.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

If you need to write a .contains like method based on Pattern, you should choose the Matcher#find() version:

Pattern.compile(Pattern.quote(s)).matcher(input).find()

If you want to use .matches(), you should bear in mind that:

  • .* will not match line breaks by default and you need (?s) inline modifier at the start of the pattern or use Pattern.DOTALL option
  • The .* at the pattern start will cause too much backtracking and you may get a stack overflow exception, or the code execution might just freeze.

Upvotes: 3

Related Questions