Reputation: 8137
If someone has seen this question before, please link, perhaps I am searching the wrong things. I get nothing but results for parsing css files. Basically, I have an array of selectors, something like
[".thislink", "#myid"]
.
I'm looking to pass any string selector formatted like css3 selectors, ex:
a.thislink:not(.ignore)[href^=http://]
into a .match and split it out into an array of selectors, ideally:
[a, .thislink, :not(.ignore), [href^=http://]]
that I can loop through. I would then use that same breakdown on any :not() selectors to get a second array of "not", which I can match against my original array of individual selectors.
Tag, class, ID, attr, and :not selectors should be all I need. I can figure out how to break down the [attr=val]
and :not(selectorshere)
myself, I think.
PS: I know it would be easy to match my original array values in the string selector, however, I don't actually have an array of selectors. It would take several paragraphs to explain why exactly I'm doing it this way, so just trust me, I can't do it =)
Upvotes: 0
Views: 922
Reputation: 25249
Just in case you won't succeed in finding a sufficient regex, may I suggest a JavaScript parser generator like PEG.js*. There's an online version of PEG.js that allows you to tinker with the grammar, then download the parser once satisfied with the result.
[ * ] PEG - Parsing Expression Grammar
For help with the grammar you should consult the W3C's working draft on CSS3 syntax W3C's recommendation on Selectors Level 3.
I took the time and played around and came up with a reduced grammar for a single selector (element/id/attr/class/pseudo). You'd want to go over it and refine it here and there, probably.
/*
* PEG.js grammar
*/
start = element? hash? (class / attr / pseudo)*
element = '*' / ident
ident = i:(nmstart) j:(nmchar*) {return i + j.join('');}
hash = h:('#' ident) {return h.join('');}
class = c:('.' ident) {return c.join('');}
attr = a:('[' (b:[^\]]+ {return b.join('');}) ']') {return a.join('');}
pseudo = p:(':' function) {return p.join('');}
nmstart = [a-z] / nonascii
nmchar = [a-z0-9-] / nonascii
function = f:(ident '(' body ')') {return f.join('');}
body = b:[^\)]+ {return b.join('');}
nonascii = [\x80-\xff]
_ = [ \t\n\r]+ {return '';}
Upvotes: 3