Regex in python to get javadoc-style comments in CSS

Question

I'm writing a python script to loop through a directory of CSS files and save the contents of any which contain a specifically-formatted javadoc style comment.

The comment/CSS looks like this:

/**thirdpartycss

* @description Used for fixing stuff

*/
.class_one {
    margin: 10px;
}
#id_two {
    padding: 2px;
}

The regex to fetch the entire contents of the file looks like this:

pattern = "/\*\*thirdpartycss(.*?)}$"
matches = re.findall(pattern, css, flags=re.MULTILINE | re.DOTALL)

This gives me the file contents. What I want to do now is write a regex to grab each CSS definition within the class. This is what I tried:

rule_pattern = "(.*){(.*)}?"
rules = re.findall(rule_pattern, matches[0], flags=re.MULTILINE | re.DOTALL)

I'm basically trying to find any text, then an opening {, any text, then a closing } - I want a list of all of the CSS classes, essentially, but this just returns the entire string in one chunk.

Can anybody point me in the right direction?

Thanks. Matt

Alex Martelli · Accepted Answer

{(.*)} is a greedy match -- it will match from the first { to the last }, thus gobble up any {/} pairs that might be inside those. You want non-greedy matching, that is

{(.*?)}

the difference is the question mark after the asterisk, making it non-greedy.

This still won't work if you need to properly match "nested" braces -- but then, nothing in the RE world will: among regular languages many well-known limitations (regular languages are those that regular expressions can match) is that "properly nesting" any kind of open/closed parentheses is impossible (some incredibly-extended so-called-RE manage to, but not Python's, and anybody with CS background will find calling those expression "regular" offensive anyway;-). If you need more general parsing than REs can afford, pyparsing or other full-fledged Python parsers are the right way to go.

Regex in python to get javadoc-style comments in CSS

Answers (2)

Related Questions