Qasim Idrees
Qasim Idrees

Reputation: 73

Regex for Word not have some specific words before

I am Looking for Regex for this result

String = This is Cold Water and this is Hot Water, have some Water.

I want to check whether this String has the word 'Water' which doesn't have these 'Cold' and 'Hot' words before it.

String mydata = "This is Cold Water and this is Hot Water, have some Water";
Pattern pattern = Pattern.compile("[^(Cold|Hot)]\sWater");
    Matcher matcher = pattern.matcher(mydata);
    if (matcher.matches()) {
        String s = matcher.group(1);
        System.out.println(s);
    }

But it is resulting a no match

Upvotes: 4

Views: 3236

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

The [^(Cold|Hot)]\sWater pattern matches any char other than (, C, o ... ), then a single whitespace and then a Water substring. The [^...] is a negated character class, you can't negate sequences of chars with it.

You may use a regex with a negative lookbehind. The most basic form of it for your case is (?<!Cold\s|Hot\s), and you may further customize it.

For example, the \s only matches 1 whitespace, and the lookbehind won't work if there are 2 or more whitespaces between Cold and Water or Hot and Water. In Java regex, you may use limiting quantifiers (see Constrained-width Lookbehind), so you may use \s{1,10} to allow the lookbehind to "see" 1 to 10 whitespaces behind.

Another enhancement could be whole word matching, enclose the words with \b, word boundary construct.

Note that Matcher#matches() requires a full string match, you actually want to use Matcher#find().

Here is an example solution:

String mydata = "This is Cold Water and this is Hot Water, have some Water";
        Pattern pattern = Pattern.compile("\\b(?<!(?:\\bCold\\b|\\bHot\\b)\\s{1,10})Water\\b");
        Matcher matcher = pattern.matcher(mydata);
        if (matcher.find()) {
            System.out.println(matcher.group(0));
        }

See the Java online demo.

Pattern details

  • \\b - a word boundary
  • (?<! - start of the negative lookbehind that fails the match if, immediately to the left of the current location, there is:
    • (?: - start of a non-capturing group matching either of the two alternatives:
      • \\bCold\\b - a whole word Cold
      • | - or
      • \\bHot\\b - a whole word Hot
    • ) - end of the non-capturing group
    • \\s{1,10} - 1 to 10 whitespaces (you may use \s if you are sure there will only be 1 whitespace between the words)
  • ) - end of the lookbehind
  • Water - the search word
  • \\b - a word boundary

Upvotes: 5

Related Questions