Zax
Zax

Reputation: 199

PHP preg_match() - how to exclude words?

I'm using PHP preg_match() to check if some given words are in string. For example, the string:

I have not failed. I've just found 10,000 ways that won't work.

My regex is built on the fly from a HTML FORM and looks like:

/(apple)|(banana)|(work)/i

In this example, we have a match, but I would like to not match if the string contains "ways". I tried:

/(apple)|(banana)|(work)(?<!ways)/i

But it also returns 1 (matched) What should I add to the regex to not match the string example?

Thanks for your help.

Upvotes: 1

Views: 5875

Answers (2)

ctwheels
ctwheels

Reputation: 22817

Code

I have the two following methods that work for your problem.

Method 1

This method uses a capture group to capture the words you want to match.

See regex in use here

(?:(?:^(?!.*\bways\b)|\G(?!\A)).*?)\b(apple|banana|work)\b

Method 2

This method doesn't capture anything, it simply matches. This resets the pattern prior to the word match.

See regex in use here

(?:(?:^(?!.*\bways\b)|\G(?!\A)).*?)\K\b(?:apple|banana|work)\b

Results

I used two strings to test against. The first contains the word ways and the second does not (I removed the s so it contains way instead).

Input

I have not failed. I've just found 10,000 ways that won't work. A banana and an apple
I have not failed. I've just found 10,000 way that won't work. A banana and an apple

Output

Since the output would actually cause more confusion if I just pasted it here, I'll just tell you that it matches work, banana, and apple in the second string and not the first (since the first string contains the negated word ways).


Explanation

I'll explain the second method since both methods are very similar, but the second method uses an additional token.

  • (?:(?:^(?!.*\bways\b)|\G(?!\A)).*?) Match the following
    • (?:^(?!.*\bways\b)|\G(?!\A)) Match either of the following
      • ^(?!.*\bways\b) Match the following
        • ^ Assert position at the start of the line
        • (?!.*\bways\b) Negative lookahead ensuring the word ways does not follow (\b is a word boundary assertion so that we don't match words that contain ways such as always or wayside as described in the comments below your question)
      • \G(?!\A) Assert position at the end of the previous match
    • .*? Match any character any number of times, but as few as possible
  • \K Reset the starting point of the reported match. Any previously consumed characters are no longer included in the final match
  • \b(?:apple|banana|work)\b Match any of the words in the group apple, banana, or work literally, while ensuring its word boundaries (so as to not match words containing these words such as pineapple or workplace)

Upvotes: 3

Neutral
Neutral

Reputation: 74

<?php 
$your_word = 'way'; // return work
// $your_word = 'ways'; // return null

preg_match('/(ways)/i',"I have not failed. I've just found 10,000 $your_word that won't work.",$x);

if(isset($x[0])=='ways'){ 
    null;
}else{ 
    preg_match('/(apple)|(banana)|(work)/i',"I have not failed. I've just found 10,000 ways that won't work.",$z);
    echo $z[0];
}   

?>

Upvotes: -1

Related Questions