rkx74656
rkx74656

Reputation: 55

Getting specific span tag text in python (BeautifulSoup)

Im scraping some information off MyAnimeList using BeautifulSoup on python3 and am trying to get information about a show's 'Status', but am having trouble accessing it.

Here is the html:

<h2>Information</h2>
    <div>
        <span class="dark_text">Type:</span>
        <a href="https://myanimelist.net/topanime.php?type=movie">Movie</a>
    </div>
    <div class="spaceit">
        <span class="dark_text">Episodes:</span>
        1
    </div>
    <div>
        <span class="dark_text">Status:</span>
        Finished Airing
    </div>

All of this is also contained within another div tag but I only included the portion of the html that I want to scrape. To clarify, I want to obtain the text 'Finished Airing' contained within 'Status'.

Here's the code I have so far but I'm not really sure if this is the best approach or where to go from here:

Page_soup = soup(Page_html, "html.parser")
extra_info = Page_soup.find('td', attrs={'class': 'borderClass'})
span_html = extra_info.select('span')
for i in range(len(span_html)):
    if 'Status:' in span_html[i].getText():

Any help would be appreciated, thanks!

Upvotes: 3

Views: 7032

Answers (2)

Game Developement
Game Developement

Reputation: 409

Another solution (maybe):

f = soup.find_all('span',attrs={'class':'dark_text'})
for i in f:
     if i.text == 'Status:':
         print(i.parent.text)

And change 'Status:' to whatever other thing you want to find. Hope I helped!

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195408

To get the text next to the <span> with "Status:", you can use:

from bs4 import BeautifulSoup

html_doc = """
<h2>Information</h2>
    <div>
        <span class="dark_text">Type:</span>
        <a href="https://myanimelist.net/topanime.php?type=movie">Movie</a>
    </div>
    <div class="spaceit">
        <span class="dark_text">Episodes:</span>
        1
    </div>
    <div>
        <span class="dark_text">Status:</span>
        Finished Airing
    </div>
"""

soup = BeautifulSoup(html_doc, "html.parser")
txt = soup.select_one('span:-soup-contains("Status:")').find_next_sibling(text=True)
print(txt.strip())

Prints:

Finished Airing

Or:

txt = soup.find("span", text="Status:").find_next_sibling(text=True)
print(txt.strip())

Upvotes: 3

Related Questions