Simplify nested HTML with Beautifulsoup

Question

I am cleaning up arbitrary HTML for printing it. I don't need to preserve the structure because I control the CSS selectors and a simpler tree seems to cause fewer errors.

Is there an idiomatic way in Beautifulsoup that will allow me to reduce nesting, or do I just need to do the hard yards and manage the tree myself?

As a very simplified example, can I make this:

from bs4 import BeautifulSoup

doc = """

    
        
            
                
                    
                        Hello
                    
                
            
            
                
                    
                        World!
                    
                
            
        
    

"""

soup = BeautifulSoup(doc, "html.parser")

print(soup.prettify())

return this:


  Hello
  World!

I'm open to non-bs4 methods too, this just seems to be the cleanest way to deal with HTML.

Simplify nested HTML with Beautifulsoup

Answers (1)

Related Questions