Reputation: 1396
"outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;"
I have this as inline css. I would like to substitute blank space for all the properties starting with "background" and "font" using regular expression. In inline css, the last property might not have semi colon as end
I am using this code as a django filter to remove those properties from server side using beautiful soup
def html_remove_attrs(value):
soup = BeautifulSoup(value)
print "hi"
for tag in soup.findAll(True,{'style': re.compile(r'')}):
#tag.attrs = None
#for attr in tag.attrs:
# if "class" in attr:
# tag.attrs.remove(attr)
# if "style" in attr:
# tag.attrs.remove(attr)
for attr in tag.attrs:
if "style" in attr:
#remove the background and font properties
return soup
Upvotes: 1
Views: 2697
Reputation: 16255
I don't know about the details of your programming environment, but you asked for a regular expression. This regular expression will find property keys (plus colon and any space) as group 1 ($1
) and property values as group 2 ($2
):
((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)
The expression does not remove the property values. It finds them. How you remove them depends on your programming environment (language/libraries).
But basically, you would be doing a global find/replace, replacing the whole result with $1
.
For example, using Java you could do this
public static void main(String[] args) throws Exception {
String[] lines = {
"outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;",
"outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left",
"background-color: #eff0f8;",
"background-color: #eff0f8",
};
String regex = "((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)";
Pattern p = Pattern.compile(regex);
for (String s: lines) {
StringBuffer sb = new StringBuffer();
Matcher m = p.matcher(s);
while (m.find()) {
// capturing group(2) for debug purpose only
// just to get it's length so we can fill that with '-'
// to assist comparison of before and after
String text = m.group(2);
text = text.replaceAll(".", "-");
m.appendReplacement(sb, "$1"+text);
// for non-debug mode, just use this instead
// m.appendReplacement(sb, "$1");
}
m.appendTail(sb);
System.err.println("> " + s); // before
System.err.println("< " +sb.toString()); // after
System.err.println();
}
}
Upvotes: 2