Nathan Spears
Nathan Spears

Reputation: 1667

Negating literal strings in a Java regular expression

So regular expressions seem to match on the longest possible match. For instance:

public static void main(String[] args) {
    String s = "ClarkRalphKentGuyGreenGardnerClarkSupermanKent";
    Pattern p = Pattern.compile("Clark.*Kent", Pattern.CASE_INSENSITIVE);
    Matcher myMatcher = p.matcher(s);
    int i = 1;
    while (myMatcher.find()) {
        System.out.println(i++ + ". " + myMatcher.group());
    }
}

generates output

  1. ClarkRalphKentGuyGreenGardnerClarkSupermanKent

I would like this output

  1. ClarkRalphKent
  2. ClarkSupermanKent

I have been trying Patterns like:

 Pattern p = Pattern.compile("Clark[^((Kent)*)]Kent", Pattern.CASE_INSENSITIVE);

that don't work, but you see what I'm trying to say. I want the string from Clark to Kent that doesn't contain any occurrences of Kent.

This string:

ClarkRalphKentGuyGreenGardnerBruceBatmanKent

should generate output

  1. ClarkRalphKent

Upvotes: 4

Views: 3860

Answers (4)

Gareth Davis
Gareth Davis

Reputation: 28059

greedy vs reluctant is your friend here.

try: Clark.+?Kent

Upvotes: 6

Jonathan Lonowski
Jonathan Lonowski

Reputation: 123453

When you tried "Clark[^((Kent)*)]Kent", I think you were wanting "Clark((?!Kent).)*Kent" for zero-width negative look-ahead (scroll down a bit to the "Look-Around Assertions" header).

Brackets specify character matching vs. pattern matching. So, the RegExp was trying to find a single character not in (, K, e, n, t, ), *.

Upvotes: 3

Adrian Pronk
Adrian Pronk

Reputation: 13906

Use the relunctant ? suffix: Clark.*?Kent The quantifiers ?, *, + can be followed by ? to indicate that they should stop as soon as possible.

see http://perldoc.perl.org/perlre.html

Upvotes: 2

Michael Borgwardt
Michael Borgwardt

Reputation: 346260

You want a "reluctant" rather than a "greedy" quantifier. Simply putting a ? after your * should do the trick.

Upvotes: 4

Related Questions