Reputation: 434
I want to select elements of a table under <i>Member</>
The html code:
<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
<a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
<i>Member</i>:
<a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
<i>Formerly</i>:
<a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>
How do I select the text of Member only for example?
I tried:
li = bs.find('i', text = "Member")
children = li.findNextSiblings()
for child in children:
member.append(child.text)
print(member)
But it brings all the results as the output:
SHDB Team
The Spider Society
New Warriors
The Six
Member
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
Formerly
Future Foundation
Heroes For Hire
Fantastic Four
I want to choose only the Member section. This code makes me choose everything after Member and before formerly, but it's an inefficient solution:
teams[teams.index("Member")+1:teams.index("Formerly")]
Upvotes: 2
Views: 135
Reputation: 16187
All i tags
are following-siblings to each others differentiated by text value following td tag
so simply, you can use css selectors slicing to select Member section
.
from bs4 import BeautifulSoup
html = """
<table class="table profile-table">
<td>
Teams
</td>
<td>
<i>
Leader
</i>
:
<a class="chip team" href="/shdb-team/20-739/">
SHDB Team
</a>
<a class="chip team" href="/the-spider-society/20-490/">
The Spider Society
</a>
<a class="chip team" href="/new-warriors/20-79/">
New Warriors
</a>
<a class="chip team" href="/the-six/20-474/">
The Six
</a>
<i>
Member
</i>
:
<a class="chip team" href="/the-mighty-avengers/20-384/">
The Mighty Avengers
</a>
<a class="chip team" href="/new-avengers/20-101/">
New Avengers
</a>
<a class="chip team" href="/shield/20-467/">
S.H.I.E.L.D.
</a>
<a class="chip team" href="/avengers-resistance/20-154/">
Avengers Resistance
</a>
<a class="chip team" href="/marvel-knights/20-377/">
Marvel Knights
</a>
<a class="chip team" href="/avengers/20-4/">
Avengers
</a>
<a class="chip team" href="/secret-defenders/20-96/">
Secret Defenders
</a>
<a class="chip team" href="/daily-bugle/20-216/">
Daily Bugle
</a>
<a class="chip team" href="/defenders/20-9/">
Defenders
</a>
<i>
Formerly
</i>
:
<a class="chip team" href="/future-foundation/20-290/">
Future Foundation
</a>
<a class="chip team" href="/heroes-for-hire/20-5/">
Heroes For Hire
</a>
<a class="chip team" href="/fantastic-four/20-1/">
Fantastic Four
</a>
</td>
</table>
"""
soup = BeautifulSoup(html, "html.parser")
for i in soup.select_one('.table.profile-table > td > i:nth-of-type(2)').next_siblings:
if i.name == 'i':
break
if i.name == 'a':
print(i.get_text(strip=True))
Output:
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
Upvotes: 1
Reputation: 25048
You could select next_siblings
of the element and check if name of sibling tag is a
or break the loop if name of tag is i
:
for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
if tag.name == 'i':
break
if tag.name == 'a':
print(tag.text)
html = '''
<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
<a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
<i>Member</i>:
<a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
<i>Formerly</i>:
<a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>
'''
soup = BeautifulSoup(html)
for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
if tag.name == 'i':
break
if tag.name == 'a':
print(tag.text)
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
Upvotes: 2