Jasim Muhammed
Jasim Muhammed

Reputation: 1396

How to remove some css properties using regular expression?

"outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;"

I have this as inline css. I would like to substitute blank space for all the properties starting with "background" and "font" using regular expression. In inline css, the last property might not have semi colon as end

I am using this code as a django filter to remove those properties from server side using beautiful soup

def html_remove_attrs(value):
    soup = BeautifulSoup(value)
    print "hi"
    for tag in soup.findAll(True,{'style': re.compile(r'')}): 
        #tag.attrs = None
        #for attr in tag.attrs:
        #    if "class" in attr:
        #        tag.attrs.remove(attr)
        #    if "style" in attr:
        #        tag.attrs.remove(attr)
        for attr in tag.attrs:
            if "style" in attr:
                #remove the background and font properties 

    return soup

Upvotes: 1

Views: 2697

Answers (1)

ewan.chalmers
ewan.chalmers

Reputation: 16255

I don't know about the details of your programming environment, but you asked for a regular expression. This regular expression will find property keys (plus colon and any space) as group 1 ($1) and property values as group 2 ($2):

 ((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)

The expression does not remove the property values. It finds them. How you remove them depends on your programming environment (language/libraries).

But basically, you would be doing a global find/replace, replacing the whole result with $1.

For example, using Java you could do this

public static void main(String[] args) throws Exception {

    String[] lines = {
        "outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;",
        "outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left",
        "background-color: #eff0f8;",
        "background-color: #eff0f8",
    };

    String regex = "((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)";

    Pattern p = Pattern.compile(regex);

    for (String s: lines) {
        StringBuffer sb = new StringBuffer();
        Matcher m = p.matcher(s);
        while (m.find()) {

            // capturing group(2) for debug purpose only
            // just to get it's length so we can fill that with '-' 
            // to assist comparison of before and after
            String text = m.group(2);
            text = text.replaceAll(".", "-");
            m.appendReplacement(sb, "$1"+text);

            // for non-debug mode, just use this instead
            // m.appendReplacement(sb, "$1");
        }
        m.appendTail(sb);

        System.err.println("> " + s); // before
        System.err.println("< " +sb.toString()); // after
        System.err.println();
    }
}

Upvotes: 2

Related Questions