Reputation: 162
I'm trying to remove the style tags and their contents from the source, but it's not working, no errors just simply doesn't decompose. This is what I have:
source = BeautifulSoup(open("page.html"))
getbody = source.find('body')
for child in getbody[0].children:
try:
if child.get('style') is not None and child.get('style') == "display:none":
# it in here
child.decompose()
except:
continue
print source
# display:hidden div's are still there.
Upvotes: 1
Views: 3905
Reputation: 1121864
The following code does what you want and works fine; do not use blanket except handling to mask bugs:
source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
hidden.decompose()
or better still, use a regular expression to cast the net a little wider:
import re
source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
hidden.decompose()
Tag.children
only lists direct children of the body
tag, not all nested children.
Upvotes: 2