Remove redundant beautifulsoup html tags

Question

How can I remove "redundant" html tags inside a beautifulsoup object?

In the example of

how can I remove redundant

tags (redundant, as in that they only add to the depth, but do not contain any addition information or attributes) to the following structure:


 
       
        Close

In terms of a graph-algorithm, I am trying to merge multiple nodes together within the beautifulsoup tree that do not contain stringts, nor attributes.

DaveTheAl · Accepted Answer

I just created a code-snippet that seems to do the job:

        for x in reversed(soup()):
            if not x.string and not x.attrs and len(x.findChildren(recursive=False)) <= 1:
                x.unwrap()

The reversed is needed, as otherwise empty tags are counted as siblings, blocking the unwrapping.

Remove redundant beautifulsoup html tags

Answers (2)

Related Questions