Adam
Adam

Reputation: 325

Finding strings with Beautiful Soup within classes

I'm working on an assignment for class. We have to scrape information about an online book list that looks something like this:

<p class="css-38z03z"><strong>1. <a data-link-name="in body link" href="https://www.theguardian.com/books/2016/feb/01/100-best-nonfiction-books-of-all-time-the-sixth-extinction-elizabeth-kolbert">The Sixth Extinction by Elizabeth Kolbert (2014)</a> </strong><br/> An` `engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.</p>

What I need to do is us beautiful soup to extract the second half of that HTML blurb. I need my output to be "An engrossing account of the looming catastrophe caused by ecology's "neighbours from hell" - mankind.

Here's the closest I can get (which isn't very close.)

soup_doc.find('p').strong
plz_work = soup_doc.strong.next_sibling
plz_work.get_text

I've tried using other varients of the sibling tags but no luck. What should I do?

Upvotes: 1

Views: 599

Answers (2)

sarartur
sarartur

Reputation: 1228

This works for this particular example, but not sure if it is stable for the entire scope that you are working with.

from bs4 import BeautifulSoup

html = """
    <p class="css-38z03z">
        <strong>1. 
            <a data-link-name="in body link" href="https://www.theguardian.com/books/2016/feb/01/100-best-nonfiction-books-of-all-time-the-sixth-extinction-elizabeth-kolbert">The Sixth Extinction by Elizabeth Kolbert (2014)
            </a> 
        </strong>
        <br/> An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.
    </p>"""


soup = BeautifulSoup(html, 'html.parser')

element_all = soup.find('p').text
element_unwanted = soup.find('strong').text
if element_unwanted in element_all:
    element = element_all.replace(element_unwanted, '').strip()
    print(element)

Upvotes: 1

MendelG
MendelG

Reputation: 20018

Simply use .next:

from bs4 import BeautifulSoup

    
html = '''<p class="css-38z03z"><strong>1. <a data-link-name="in body link" href="https://www.theguardian.com/books/2016/feb/01/100-best-nonfiction-books-of-all-time-the-sixth-extinction-elizabeth-kolbert">The Sixth Extinction by Elizabeth Kolbert (2014)</a> </strong><br/> An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.</p>
'''

soup = BeautifulSoup(html, "html.parser")
print(soup.select_one('.css-38z03z br').next)

Output:

An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.

Upvotes: 1

Related Questions