Randy Hall
Randy Hall

Reputation: 8137

Parse css3 selectors with regex (javascript)

If someone has seen this question before, please link, perhaps I am searching the wrong things. I get nothing but results for parsing css files. Basically, I have an array of selectors, something like

[".thislink", "#myid"].

I'm looking to pass any string selector formatted like css3 selectors, ex:

a.thislink:not(.ignore)[href^=http://]

into a .match and split it out into an array of selectors, ideally:

[a, .thislink, :not(.ignore), [href^=http://]]

that I can loop through. I would then use that same breakdown on any :not() selectors to get a second array of "not", which I can match against my original array of individual selectors.

Tag, class, ID, attr, and :not selectors should be all I need. I can figure out how to break down the [attr=val] and :not(selectorshere) myself, I think.

PS: I know it would be easy to match my original array values in the string selector, however, I don't actually have an array of selectors. It would take several paragraphs to explain why exactly I'm doing it this way, so just trust me, I can't do it =)

Upvotes: 0

Views: 922

Answers (1)

aefxx
aefxx

Reputation: 25249

Just in case you won't succeed in finding a sufficient regex, may I suggest a JavaScript parser generator like PEG.js*. There's an online version of PEG.js that allows you to tinker with the grammar, then download the parser once satisfied with the result.

[ * ] PEG - Parsing Expression Grammar

For help with the grammar you should consult the W3C's working draft on CSS3 syntax W3C's recommendation on Selectors Level 3.

I took the time and played around and came up with a reduced grammar for a single selector (element/id/attr/class/pseudo). You'd want to go over it and refine it here and there, probably.

/*
 * PEG.js grammar
 */
start      = element? hash? (class / attr / pseudo)*
element    = '*' / ident

ident      = i:(nmstart) j:(nmchar*) {return i + j.join('');}
hash       = h:('#' ident) {return h.join('');}
class      = c:('.' ident) {return c.join('');}
attr       = a:('[' (b:[^\]]+ {return b.join('');}) ']') {return a.join('');}
pseudo     = p:(':' function) {return p.join('');}

nmstart    = [a-z] / nonascii
nmchar     = [a-z0-9-] / nonascii
function   = f:(ident '(' body ')') {return f.join('');}
body       = b:[^\)]+ {return b.join('');}

nonascii   = [\x80-\xff]
_          = [ \t\n\r]+ {return '';}

Upvotes: 3

Related Questions