Brent Vaalburg
Brent Vaalburg

Reputation: 119

Python BeautifulSoup find next_sibling

I have some html scraping code issues with beautiful soup. I cannot figure out how to go through the whole html document to find the rest of the things I am looking for.

I have this code that will find and print the word "Totem" in the below html. I want to be able to cycle through the html and find the remaining "One, Two, Three", and "Rent"

Code that works to find the first tag and text:

print(html.find('td', {'class': 'play'}).next_sibling.next_sibling.text)

Let the below be the sample html to scrape:

<tr>
    <td class="play">

      <a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
        <audio class="as_audio_preview" src="https://shopify.audiosalad.com/"  >foo</audio>

    </td>
    **<td>Totem</td>**
    <!--<td>$0.99</td>-->
    <td class="buy">


  <tr>
    <td class="play">

      <a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
        <audio class="as_audio_preview" src="https://shopify.audiosalad.com/"  >foo</audio>

    </td>
    **<td>One, Two, Three</td>**
    <!--<td>$0.99</td>-->
    <td class="buy">


  <tr>
    <td class="play">

      <a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
        <audio class="as_audio_preview" src="https://shopify.audiosalad.com/"  >foo</audio>

    </td>
    **<td>Rent</td>**
    <!--<td>$0.99</td>-->
    <td class="buy">

Upvotes: 3

Views: 3604

Answers (2)

SIM
SIM

Reputation: 22440

Try this. It should fetch you the content you are after:

from bs4 import BeautifulSoup

soup = BeautifulSoup(content,"lxml")
for items in soup.find_all(class_="play"):
    data = items.find_next_sibling().text
    print(data)

Or, you can try like this as well:

for items in soup.find_all(class_="play"):
    data = items.find_next("td").text
    print(data)

Output:

Totem
One, Two, Three
Rent

Upvotes: 1

zelenyjan
zelenyjan

Reputation: 703

you have to iterate over elements, like this:

for td in html.find_all('td', {'class': 'play'}):
    print(td.next_sibling.next_sibling.text)

Upvotes: 0

Related Questions