hold3n
hold3n

Reputation: 75

Negate a regular expression

I would like to ask you to help me with the following regular expression:

(start{(.*?)}((?:(start{(.*?)}.*?end{(.*?)})|.)*?)end{(\2)})

I want to negate it. Assuming that we have the following input:

start{outer1}
  Recipe A:
    start{inner1}
      -ingredient1
      -ingredient2
    end{inner1}
end{outer1}
... something ... 
... blah blah blah ...    
start{outer2}
  Recipe B:
    start{inner1}
      - ingredient1
    end{inner1}
end{outer2}
... something ...

output should be:

... something ... 
... blah blah blah ...
... something ...

I really appreciate any help you can provide.

Upvotes: 1

Views: 204

Answers (3)

p.s.w.g
p.s.w.g

Reputation: 149010

It depends on what language / environment your using, but almost universally, any regular expression engine that supports matching also supports splitting. When splitting, any part of the input between the matches is returned in an array (or an analogous structure).

For example in C#:

var re = new Regex("(start{(.*?)}((?:(start{(.*?)}.*?end{(.*?)})|.)*?)end{(\2)})");
var matches = re.Matches(input); // { "start{outer1}...", "start{outer2}..." }
var split = re.Split(input); // { "... something ... ... blah blah blah ...", "... something ... " }

I'm not a Java developer, but you should look at using Pattern.split. It should do what need.

Upvotes: 3

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

Description

This regex will separate out the good things from the bad, however you will have to collect all the capture group 2 values and only use them if Group 1 is empty

^start\{([^}]*)\}.*?^end\{\1\}[\r\n]*|(.*?)[\r\n]*(?=^start\{[^}]*\}|\Z)

enter image description here

PHP Code Example:

Sample Text

start{outer1}
  Recipe A:
    start{inner1}
      -ingredient1
      -ingredient2
    end{inner1}
end{outer1}
... something ... 
... blah blah blah ...    
start{outer2}
  Recipe B:
    start{inner1}
      - ingredient1
    end{inner1}
end{outer2}
... something ...

Code

<?php
$sourcestring="your source string";
preg_match_all('/^start\{([^}]*)\}.*?^end\{\1\}[\r\n]*|(.*?)[\r\n]*(?=^start\{[^}]*\}|\Z)/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches

Note the capture group 2 has only the desired values.

[0] => Array
    (
        [0] => start{outer1}
  Recipe A:
    start{inner1}
      -ingredient1
      -ingredient2
    end{inner1}
end{outer1}

        [1] => ... something ... 
... blah blah blah ...    

        [2] => start{outer2}
  Recipe B:
    start{inner1}
      - ingredient1
    end{inner1}
end{outer2}

        [3] => ... something ...
        [4] => 
    )

[1] => Array
    (
        [0] => outer1
        [1] => 
        [2] => outer2
        [3] => 
        [4] => 
    )

[2] => Array
    (
        [0] => 
        [1] => ... something ... 
... blah blah blah ...    
        [2] => 
        [3] => ... something ...
        [4] => 
    )

Upvotes: 1

J David Smith
J David Smith

Reputation: 4810

There isn't really a way to negate a regex in the syntax of the regex itself. However, you could apply your regex to the dataset and then remove the matching regions using the returned indeces. This would leave you with everything that didn't match the regex, effectively inverting it.

If you're doing this in a *nix shell, you can use:

grep -v <regex>

which prints every line that does NOT contain the regex. Based on what you said, that doesn't appear to be the case though.

Upvotes: 1

Related Questions