Reputation: 945
Here's what my HTML looks like:
<head> ... </head>
<body>
<div>
<h2>Something really cool here<h2>
<div class="section mylist">
<table id="list_1" class="table">
<thead> ... not important <thead>
<tr id="blahblah1"> <td> ... </td> </tr>
<tr id="blah2"> <td> ... </td> </tr>
<tr id="bl3"> <td> ... </td> </tr>
</table>
</div>
</div>
</body>
Now there are four occurrences of this div
in my html file, each table content is different and each h2
text is different. Everything else is relatively the same. What I've been able to do so far is extract out the parent of each h2
- however, now I am not sure how to extract out each tr
where in then, I can extract out the td
that I really need.
Here is the code I've written so far...
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('myhtml.html'), 'html.parser')
currently_watching = soup.find('h2', text='Something really cool here')
parent = currently_watching.parent
Upvotes: 1
Views: 5355
Reputation: 402483
I would suggest finding the parent div
, which actually encloses the table, and then search for all td
tags. Here's how you'd do it:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('myhtml.html'), 'lxml')
div = soup.find('div', class_='section mylist')
for td in div.find_all('td'):
print(td.text)
Upvotes: 2
Reputation: 945
Searched around a bit and realized that it was my parser that was causing the issue. I installed lxml and everything works fine now.
Why is BeautifulSoup not finding a specific table class?
Upvotes: 0