MortenB
MortenB

Reputation: 3567

beautiful soup get table after specific header

How do I position BS4 to start with the table after <h3>64-bit deb for Ubuntu/Debian</h3>? There are lots of tables and the only distinct is the header.

<h3>Windows 64-bit</h3>
<table width="600" border="1" align="center">
:
</table>
:
<h3>64-bit deb for Ubuntu/Debian</h3>
<table width="600" border="1" align="center">
:
</table>
:

Upvotes: 1

Views: 724

Answers (2)

QHarr
QHarr

Reputation: 84465

bs4 4.7.1 + you can use :contains with adjacent sibling (+) combinator. No need for a loop.

from bs4 import BeautifulSoup as bs

html = '''<h3>Windows 64-bit</h3>
<table width="600" border="1" align="center">
:
</table>
:
<h3>64-bit deb for Ubuntu/Debian</h3>
<table width="600" border="1" align="center">
:'''
soup = bs(html, 'lxml')
table = soup.select_one('h3:contains("64-bit deb for Ubuntu/Debian") + table')

Upvotes: 2

Hryhorii Pavlenko
Hryhorii Pavlenko

Reputation: 3910

Would this work?

>>> for header in soup.find_all('h3'):
...     if header.get_text() == '64-bit deb for Ubuntu/Debian':
...         header.find_next_sibling()
...
<table align="center" border="1" width="600">
:
</table>

Upvotes: 3

Related Questions