Reputation: 3567
How do I position BS4 to start with the table after <h3>64-bit deb for Ubuntu/Debian</h3>
? There are lots of tables and the only distinct is the header.
<h3>Windows 64-bit</h3>
<table width="600" border="1" align="center">
:
</table>
:
<h3>64-bit deb for Ubuntu/Debian</h3>
<table width="600" border="1" align="center">
:
</table>
:
Upvotes: 1
Views: 724
Reputation: 84465
bs4 4.7.1 + you can use :contains
with adjacent sibling (+) combinator. No need for a loop.
from bs4 import BeautifulSoup as bs
html = '''<h3>Windows 64-bit</h3>
<table width="600" border="1" align="center">
:
</table>
:
<h3>64-bit deb for Ubuntu/Debian</h3>
<table width="600" border="1" align="center">
:'''
soup = bs(html, 'lxml')
table = soup.select_one('h3:contains("64-bit deb for Ubuntu/Debian") + table')
Upvotes: 2
Reputation: 3910
Would this work?
>>> for header in soup.find_all('h3'):
... if header.get_text() == '64-bit deb for Ubuntu/Debian':
... header.find_next_sibling()
...
<table align="center" border="1" width="600">
:
</table>
Upvotes: 3