extract table from html by position using Python

Question

I want to extract a specific table from an html document that contains mutliple tables, but unfortunately there are no identifiers. There is a table title, however. I just can't seem to figure it out.

Here is an example html file




    
TABLE 1    


Data 1    
Data 2    


Data 3    
Data 4    


Data 5    
Data 6    





    
TABLE 2    


Data 7    
Data 8    


Data 9    
Data 10    


Data 11    
Data 12

I can use beautifulSoup 4 to get tables by id or name, but I need just a single table that is only identifiable by position.

I know that I can get the first table with:

tmp = f.read()
soup = BeautifulSoup(tmp) ## make it readable
table = soup.find('table') ### gets first table

but how would I get the second table?

alecxe · Accepted Answer

You can rely on the table title.

Find the element by text passing a function as a text argument value, then get the parent:

table_name = "TABLE 1" 

table = soup.find(text=lambda x: x and table_name in x).find_parent('table')

extract table from html by position using Python

Answers (2)

Related Questions