Reputation: 663

Parsing css with a regex

I'm wanting to scan through a css file and capture both comments and the css. I've came up with a regex that's almost there, however it's not quite perfect as it misses out properties with multiple declarations i.e.

ul.menu li a, # Won't capture this line
ul.nice-menu li a { text-decoration: none; cursor:pointer; }

Here's the regex that I'm working with:

(\/\*[^.]+\*\/\n+)?([\t]*[a-zA-Z0-9\.# -_:@]+[\t\s]*\{[^}]+\})

I've been testing this at rubular.com and here is what it currently matches, and what the array output is like.

Result 1

[0] /* Index */
/*
GENERAL

PAGE REGIONS
- Header bar region
- Navigation bar region
- Footer region           
SECTION SPECIFIC
- Homepage
- News */

[1] html { background: #ddd; }

Result 2

[0]
[1] body { background: #FFF; font-family: "Arial", "Verdana", sans-serif; color: #545454;}

I must point out that I'm still a new when it comes to regular expressions, so if anyone can help and show where I'm going wrong, it'd be much appreciated :)

BTW: I'm using PHP and preg_match_all

Upvotes: 0

Answers (2)

Jørgen Fogh

Reputation: 7656

What language are you using?

You should probably just use a library to parse the CSS. Libraries can save you a lot of grief.

Upvotes: 0

peter.murray.rust

Reputation: 38073

CSS cannot be fully parsed with a regex (see CSS Grammar: http://www.w3.org/TR/CSS2/grammar.html). The {...} can be split over lines, for example, and your current version wouldn't handle this. If you need to do this, you should read the CSS spec and use a tool like ANTLR to generate a parser.

Here is an example from the W3C spec (http://www.w3.org/TR/CSS2/syndata.html):

@import "subs.css";
@import "print-main.css" print;
@media print {
  body { font-size: 10pt }
}
h1 {color: blue }

No normal regex is powerful enough to deal with nested {...} etc. let alone the contents of the imported stylesheets.

Upvotes: 6

Parsing css with a regex

Answers (2)

Related Questions