SISYN
SISYN

Reputation: 2259

Correcting an illegal PCRE regex in PHP

Update 5/26

I've fixed the behavior of the regular expressions that were previously contained in this question, but as others have mentioned, my syntax still wasn't correct. Apparently the fact that it compiles is due to PHP's preg_* family of functions overlooking my mistakes.

I'm definitely a PCRE novice so I'm trying to understand what mistakes are present so that I can go about fixing them. I'm also open to critique about design/approach, and as others have mentioned, I am also going to build in compatibility with JSON and YAML, but I'd like to go ahead and finish this home-brewed parser since I have it working and I just need to work on the expression syntax (I think).

Here are all of the preg_match_all references and the one preg_replace reference extracted from the whole page of code:

// matches the outside container of objects {: and :}
$regex = preg_match_all('/\s\{:([^\}]+):\}/i', $this->html, $HTMLObjects);

// double checks that the object container is removed
$markup = preg_replace('/[\{:]([^\}]+):\}/i', '$1', $markup);

// matches all dynamic attributes (those containing bracketed data)
$dynamicRegEx = preg_match_all('/[\n]+([a-z0-9_\-\s]+)\[([^\]]+)\]/', $markup, $dynamicMatches);

// matches all static attributes (simple colon-separated attributes)
$staticRegEx = preg_match_all('/([^:]+):([^\n]+)/', $staticMarkup, $staticMatches);

If you'd like to see the preg_match_all and preg_replace references in context so that you can comment/critique that as well, you can see the containing source file by following the link below.

Note: viewing the source code of the page makes everything much more readable http://mdl.fm/codeshare.php?htmlobject

Like I said, I have it functioning as it stands, I'm just asking for supervision on my PCRE syntax so that it isn't illegal. However, if you have comments on the structure/design or anything else I'm open to all suggestions.

Upvotes: 1

Views: 74

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

(Rewritten to reflect new question)

The first regex is correct, but you don't need to escape } within a character class. Also, I usually include both braces to avoid the matching of nested objects (your regex would match {:foo {:bar:} in the string "{:foo {:bar:} baz:}"), mine would only match {:bar:}. The /i mode modifier is useless since there is no cased text in your regex.

// matches the outside container of objects {: and :}
$regex = preg_match_all('/\s\{:([^{}]+):\}/', $this->html, $HTMLObjects);

In your second regex, there is an incorrect character class at the start that needs to be removed. Otherwise, it's the same.

// double checks that the object container is removed
$markup = preg_replace('/\{:([^{}]+):\}/', '$1', $markup);

Your third regex looks OK; there's another useless character class, though. Again, I've included both brackets in the negated character class. I'm not sure why you've made it case-sensitive - shouldn't there be an /i modifier here?

// matches all dynamic attributes (those containing bracketed data)
$dynamicRegEx = preg_match_all('/\n+([a-z0-9_\-\s]+)\[([^\[\]]+)\]/i', $markup, $dynamicMatches);

The last regex is OK, but it will always match from the very first character of the string until the first colon (and then on to the rest of the line). I think I would add a newline character to the first negated character class to make sure that can't happen:

// matches all static attributes (simple colon-separated attributes)
$staticRegEx = preg_match_all('/([^\n:]+):([^\n]+)/', $staticMarkup, $staticMatches);

Upvotes: 3

Related Questions