BeautifulSoup not handling HTML table inside anchor tag

Question

Consider the sample HTML code:

On using BeautifulSoup on this via: html_soup = BeautifulSoup(html_source_code,"lxml") I get:



    Testing


    
    
    
        
            Hello

Note how the table is no longer contained within the anchor tag, thereby altering the output.

I have run the source code through online validators (e.g. https://validator.w3.org/) and they return no errors or warning, and so I believe there is nothing wrong with the HTML code itself.

Why does BS cause this error, and how can I fix it? p.s. Not trivial for me (in my real use case) to move the tags to inner elements owing to pre-defined and CSS and JS features.

Rakesh · Accepted Answer

Use "html.parser"

Ex:

from bs4 import BeautifulSoup

html_source_code = """


    Testing


    
        
            
                Hello
            
        
    

"""

html_soup = BeautifulSoup(html_source_code,"html.parser")
print(html_soup.prettify(formatter='html'))

Output:



 
  
   Testing
  
 
 
  
   
    
     
      Hello

BeautifulSoup not handling HTML table inside anchor tag

Answers (1)

Related Questions