Use Beautiful Soup to Extract Multiple Tables And Headers

Question

I have a piece of HTML structured similarly to:



"Title of the table below"



"Title of the table below"





"Title of the table below"





ETC...


I can strip the 'TR' elements fairly easily, creating one big table, but I need to find a way to retain the structure of each individual table elements and get the title for each element. 

There are an unknown number of lists and there will be one header for each list. 

I am fairly new to python and very new to web scraping.

QHarr · Accepted Answer

Don't know what expected output should be but with above you could gather h3 and table within nodelist and loop testing tag.name and handling accordingly

html = '''

  
  
   
   "Title of the table below" 
    
     
     table1 
     x 
     
    
   "Title of the table below2" 
    
     
     table2 
     y 
     
    
    
 
'''

soup = bs(html, 'lxml')

for item in soup.select('#MainSection h3, #MainSection table'):
    if item.name == 'h3':
        header = item.text
        print(header)
    else:
        table = pd.read_html(str(item))[0]
        print(table)

Use Beautiful Soup to Extract Multiple Tables And Headers

Answers (1)

Related Questions