joshholat
joshholat

Reputation: 3401

Regex to match specific string not enclosed by another, different specific string

I need a regex to match a string not enclosed by another different, specific string. For instance, in the following situation it would split the content into two groups: 1) The content before the second {Switch} and 2) The content after the second {Switch}. It wouldn't match the first {Switch} because it is enclosed by {my_string}'s. The string will always look like shown below (i.e. {my_string}any content here{/my_string})

Some more  
  {my_string}
  Random content
  {Switch} //This {Switch} may or may not be here, but should be ignored if it is present
  More random content
  {/my_string}
Content here too
{Switch}
More content

So far I've gotten what is below which I know isn't very close at all:

(.*?)\{Switch\}(.*?)

I'm just not sure how to use the [^] (not operator) with a specific string versus different characters.

Upvotes: 1

Views: 234

Answers (5)

Mohammer
Mohammer

Reputation: 405

$regex = (?:(?!\{my_string\})(.*?))(\{Switch\})(?:(.*?)(?!\{my_string\}));
/* if "my_string" and "Switch" aren't wrapped by "{" and "}" just remove "\{" and "\}" */
$yourNewString = preg_replace($regex,"$1",$yourOriginalString);

This might work. Can't test it know, but i'll update later! I don't if this is what you're looking for, but to negate more than one character, the regex syntax is:

(?!yourString) 

and it is called "negative lookahead assertion".

/Edit:

This should work and return true:

$stringMatchesYourRulesBoolean = preg_match('~(.*?)('.$my_string.')(.*?)(?<!'.$my_string.') ?('.$switch.') ?(?!'.$my_string.')(.*?)('.$my_string.')(.*?)~',$yourString);

Upvotes: 1

Wh1T3h4Ck5
Wh1T3h4Ck5

Reputation: 8509

Try this simple function:

function find_content()

function find_content($doc) {
  $temp = $doc;
  preg_match_all('~{my_string}.*?{/my_string}~is', $temp, $x);
  $i = 0;
  while (isset($x[0][$i])) {
    $temp = str_replace($x[0][$i], "{REPL:$i}", $temp);
    $i++;
    }
  $res = explode('{Switch}', $temp);
  foreach ($res as &$part) 
    foreach($x[0] as $id=>$content)
      $part = str_replace("{REPL:$id}", $content, $part);
  return $res;
  }

Use it this way

$content_parts = find_content($doc); // $doc is your input document
print_r($content_parts);

Output (your example)

Array
(
    [0] => Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too

    [1] => 
More content
)

Upvotes: 1

kingjeffrey
kingjeffrey

Reputation: 15280

You can try positive lookahead and lookbehind assertions (http://www.regular-expressions.info/lookaround.html)

It might look something like this:

$content = 'string of text before some random content switch text some more random content string of text after';
$before  = preg_quote('String of text before');
$switch  = preg_quote('switch text');
$after   = preg_quote('string of text after');
if( preg_match('/(?<=' $before .')(.*)(?:' $switch .')?(.*)(?=' $after .')/', $content, $matches) ) {
    // $matches[1] == ' some random content '
    // $matches[2] == ' some more random content '
}

Upvotes: 1

Benjamin Crouzier
Benjamin Crouzier

Reputation: 41895

Have a look at PHP PEG. It is a little parser written in PHP. You can write your own grammar and parse it. It's going to be very simple in your case.

The grammar syntax and the way of parsing is all explained in the README.md

Extracts from the readme:

  token*  - Token is optionally repeated
  token+ - Token is repeated at least one
  token? - Token is optionally present

Tokens may be :

 - bare-words, which are recursive matchers - references to token rules defined elsewhere in the grammar,
 - literals, surrounded by `"` or `'` quote pairs. No escaping support is provided in literals.
 - regexs, surrounded by `/` pairs.
 - expressions - single words (match \w+)

Sample grammar: (file EqualRepeat.peg.inc)

class EqualRepeat extends Packrat {
/* Any number of a followed by the same number of b and the same number of c characters
 * aabbcc - good
 * aaabbbccc - good
 * aabbc - bad
 * aabbacc - bad
 */

/*Parser:Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
X: &(A !"b") "a"+ B !("a" | "b" | "c")
*/
}

Upvotes: 0

zigdon
zigdon

Reputation: 15063

It really seems you're trying to use a regular expression to parse a grammar - something that regular expressions are really bad at doing. You might be better off writing a parser to break down your string into the tokens that build it, and then processing that tree.

Perhaps something like http://drupal.org/project/grammar_parser might help.

Upvotes: 2

Related Questions