Synchro
Synchro

Reputation: 37810

Recursive/subroutine regex to match CSS media queries

I'm looking for a regular expression (in PHP PCRE) that can match media queries and their contents reliably, including the somewhat odd case where a media query body is empty. Source text might be:

@media only screen {
    p {
        color:red;
    }
}
@media only screen and (max-width: 596px) {
    p {
        color:blue;
    }
    img {
        max-width: 200px;
    }
}
@media only screen {

}
img {
    display: block;
}
@media only screen and (max-width: 240px) {
    p {
        color:green;
    }
}
p {
    font-weight: normal;
}

I want to capture each media query and its CSS body as subpatterns, so I'll end up with a PHP array like:

[['@media only screen {
        p {
            color:red;
        }
    }','p {
            color:red;
        }'],...]

The key thing is that this needs to be a recursive or subroutine pattern in order to balance the braces. The empty query is enough to confuse the pattern in this question because it can't distinguish the end of a css rule from the end of the empty media query:

/@media[^{]+\{([\s\S]+?\})\s*\}/

I've been trying and failing to use the advice in this article to make a pattern of the form (b(?:m|(?1))*e), where b is what begins the construct, m is what can occur in the middle of the construct, and e is what can occur at the end, and none of them can match the same thing.

So, b should be @media[^{]+\{, e should be \}, and m needs to consume CSS rules, perhaps ([^{]+?\{[^}]*?\s*\}), giving me:

/(@media[^{]+\{(?:([^{]+?\{[^}]*?\}\s*)*|(?1))*\})/s

However, that doesn't work so I'm a bit lost. Can anyone suggest an effective pattern?

I've set up a regex test here.

Alternatively, a non-regex parser might work better.

Note that I'm not attempting to validate or match CSS selectors in general (not a job for a regex), just grab the content of the query and its body.

Update added more sample content, explained what I want to get out.

Upvotes: 2

Views: 938

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

If you are sure the blocks you want to match always have a balanced number of braces, you can use a regex with subroutine like this:

'~@media\b[^{]*({((?:[^{}]+|(?1))*)})~'

See the regex demo

And here is an IDEONE demo:

$re = '~@media\b[^{]*({((?:[^{}]+|(?1))*)})~'; 
$str = "@media only screen {\n    p {\n        color:red;\n    }\n}\n@media only screen and (max-width: 596px) {\n    p {\n        color:blue;\n    }\n    img {\n        max-width: 200px;\n    }\n}\n@media only screen {\n\n}\nimg {\n    display: block;\n}\n@media only screen and (max-width: 240px) {\n    p {\n        color:green;\n    }\n}\np {\n    font-weight: normal;\n}"; 
preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER);
print_r($matches[0]);
print_r($matches[2]);

Pattern details:

  • @media\b - match @media as a whole word (since \b is a word boundary)
  • [^{]* - match 0+ characters other than {
  • ({((?:[^{}]+|(?1))*)}) - a capturing group #1 capturing the {...} blocks with the balanced number of { and } (note it is a technical group, we need to recurse this group subpattern in order to correctly match the {...}s). It matches...
    • { - an opening brace
    • ((?:[^{}]+|(?1))*) - Group 2 (the contents inside the balanced {...}) matching
      • [^{}]+ - 1+ characters other than { and } (because we need to match everything that is not the leading and trailing delimiters)
      • | - or...
      • (?1) - the whole Group 1 subpattern
    • } - a closing brace

Note that $matches[2] can be further processed with preg_match_all('~\s*(\w+)\s*{\s*([^}]*?)\s*}~', $matches[2], $subblocks) pattern.

Upvotes: 4

Related Questions