Reputation: 6211
I'm using BeautifulSoup to remove inline heights and widths from my elements. Solving it for images was simple:
def remove_dimension_tags(tag):
for attribute in ["width", "height"]:
del tag[attribute]
return tag
But I'm not sure how to go about processing something like this:
<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">
when I would want to leave the background-color (for example) or any other style attributes other than height or width.
The only way I can think of doing it is with a regex but last time I suggested something like that the spirit of StackOverflow came out of my computer and murdered my first-born.
Upvotes: 1
Views: 1296
Reputation: 12158
import bs4
html = '''<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">'''
soup = bs4.BeautifulSoup(html, 'lxml')
Tag's attribute is a dict object, you can modify it like a dict:
get item:
soup.div.attrs
{'class': ['wp-caption', 'aligncenter'],
'id': 'attachment_9565',
'style': 'width: 2010px;background-color:red'}
set item:
soup.div.attrs['style'] = soup.div.attrs['style'].split(';')[-1]
{'class': ['wp-caption', 'aligncenter'],
'id': 'attachment_9565',
'style': 'background-color:red'}
Use Regex:
soup.div.attrs['style'] = re.search(r'background-color:\w+', soup.div.attrs['style']).group()
Upvotes: -1
Reputation: 43169
A full walk-through would be:
from bs4 import BeautifulSoup
import re
string = """
<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">
<p>Some line here</p>
<hr/>
<p>Some other beautiful text over here</p>
</div>
"""
# look for width or height, followed by not a ;
rx = re.compile(r'(?:width|height):[^;]+;?')
soup = BeautifulSoup(string, "html5lib")
for div in soup.findAll('div'):
div['style'] = rx.sub("", string)
As stated by others, using regular expressions on the actual value is not a problem.
Upvotes: 2
Reputation: 8382
You could use regex if you want, but there is a simpler way.
Use cssutils
for a simpler css parsing
A simple example:
from bs4 import BeautifulSoup
import cssutils
s = '<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">'
soup = BeautifulSoup(s, "html.parser")
div = soup.find("div")
div_style = cssutils.parseStyle(div["style"])
del div_style["width"]
div["style"] = div_style.cssText
print (div)
Outputs:
>>><div class="wp-caption aligncenter" id="attachment_9565" style="background-color: red"></div>
Upvotes: 2