user273324
user273324

Reputation: 162

BeautifulSoup removing tags

I'm trying to remove the style tags and their contents from the source, but it's not working, no errors just simply doesn't decompose. This is what I have:

source = BeautifulSoup(open("page.html"))
getbody = source.find('body')
for child in getbody[0].children:
    try:
        if child.get('style') is not None and child.get('style') == "display:none":
            # it in here
            child.decompose()
    except:
        continue
print source
# display:hidden div's are still there.

Upvotes: 1

Views: 3905

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121864

The following code does what you want and works fine; do not use blanket except handling to mask bugs:

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
    hidden.decompose()

or better still, use a regular expression to cast the net a little wider:

import re

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
    hidden.decompose()

Tag.children only lists direct children of the body tag, not all nested children.

Upvotes: 2

Related Questions