kig
kig

Reputation: 5

Regex splitting string by space or " NOT " sequence (php)?

I'm looking to split a string by spaces, unless there is the string " NOT ", in which case I would only want to split by the space before the "NOT", and not after the "NOT". Example:

"cancer disease NOT brain NOT sickle"

should become:

["cancer", "disease", "NOT brain", "NOT sickle"]

Here is what I have so far, but it is incorrect:

$splitKeywordArr = preg_split('/[^(NOT)]( )/', "cancer disease NOT brain NOT sickle")

It results in:

["cance", "diseas", "NOT brai", "NOT sickle"]

I know why it is incorrect, but I don't know how to fix it.

Upvotes: 0

Views: 54

Answers (2)

Jan
Jan

Reputation: 43169

You may use

<?php

$text = "cancer disease NOT brain NOT sickle";
$pattern = "~NOT\s+(*SKIP)(*FAIL)|\s+~";

print_r(preg_split($pattern, $text));
?>

Which yields

Array
(
    [0] => cancer
    [1] => disease
    [2] => NOT brain
    [3] => NOT sickle
)

See a demo on ideone.com.

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163362

You might also match optional repetitions of the word NOT followed by 1+ word characters in case the word occurs multiple times after each other.

(?:\bNOT\h+)*\w+

The pattern matches:

  • (?: Non capture group
  • \bNOT\h+ A word boundary, match NOT and 1 or more horizontal whitespace chars
  • )* Close non capture group and optionally repeat
  • \w+ Match 1+ word characters

Regex demo | Php demo

$str = "cancer disease NOT brain NOT sickle";
preg_match_all('/(?:\bNOT\h+)*\w+/', $str, $matches);
print_r($matches[0]);

Output

Array
(
    [0] => cancer
    [1] => disease
    [2] => NOT brain
    [3] => NOT sickle
)

Upvotes: 1

Related Questions