Reputation: 55
Im scraping some information off MyAnimeList using BeautifulSoup on python3 and am trying to get information about a show's 'Status', but am having trouble accessing it.
Here is the html:
<h2>Information</h2>
<div>
<span class="dark_text">Type:</span>
<a href="https://myanimelist.net/topanime.php?type=movie">Movie</a>
</div>
<div class="spaceit">
<span class="dark_text">Episodes:</span>
1
</div>
<div>
<span class="dark_text">Status:</span>
Finished Airing
</div>
All of this is also contained within another div tag but I only included the portion of the html that I want to scrape. To clarify, I want to obtain the text 'Finished Airing' contained within 'Status'.
Here's the code I have so far but I'm not really sure if this is the best approach or where to go from here:
Page_soup = soup(Page_html, "html.parser")
extra_info = Page_soup.find('td', attrs={'class': 'borderClass'})
span_html = extra_info.select('span')
for i in range(len(span_html)):
if 'Status:' in span_html[i].getText():
Any help would be appreciated, thanks!
Upvotes: 3
Views: 7032
Reputation: 409
Another solution (maybe):
f = soup.find_all('span',attrs={'class':'dark_text'})
for i in f:
if i.text == 'Status:':
print(i.parent.text)
And change 'Status:' to whatever other thing you want to find. Hope I helped!
Upvotes: 2
Reputation: 195408
To get the text next to the <span>
with "Status:"
, you can use:
from bs4 import BeautifulSoup
html_doc = """
<h2>Information</h2>
<div>
<span class="dark_text">Type:</span>
<a href="https://myanimelist.net/topanime.php?type=movie">Movie</a>
</div>
<div class="spaceit">
<span class="dark_text">Episodes:</span>
1
</div>
<div>
<span class="dark_text">Status:</span>
Finished Airing
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
txt = soup.select_one('span:-soup-contains("Status:")').find_next_sibling(text=True)
print(txt.strip())
Prints:
Finished Airing
Or:
txt = soup.find("span", text="Status:").find_next_sibling(text=True)
print(txt.strip())
Upvotes: 3