rStorms
rStorms

Reputation: 1115

Beautiful Soup: Get specific text that has no specific class

I am trying to get the highlighted text "frei ab 01.05.2017" below. The problem is however, that the class "section_content iw_right" exists 19 times on that website. I would do a find_all and return only the 11th element from there, however on some sites that I want to scrape there are a different number of that class, so I might not always catch the right one. Any ideas? Thanks!

enter image description here

Upvotes: 1

Views: 1853

Answers (2)

alecxe
alecxe

Reputation: 474071

One way to get to the desired element is using the preceding label - locate the span element with "Erdgeschoss" text and find the next strong sibling:

label = soup.find("span", text="Erdgeschoss")
print(label.find_next_sibling("strong").get_text())

Upvotes: 1

Satish Prakash Garg
Satish Prakash Garg

Reputation: 2233

You can use lxml which is order of magnitude faster than BeautifulSoup.

The following code can help you in achieving the desired result.

from lxml import html
html_string = """
    <div class="clear">
        <div class="section_content iw_right">
            <p>
            <span>
            </span>
            <strong>hello</strong>
            <br>
            <strong>gen</strong>
            </p>
        </div>
    </div>

    <div class="clear">
        <p>
        <span>
        </span>
        <strong>hello1</strong>
        <br>
        <strong>gen1</strong>
        </p>
    </div>
"""
root = html.fromstring(html_string)
r_xp = [elem.xpath('.//p/strong/text()')[0] for elem in root.xpath('//div[@class="clear"]')]
print(r_xp)

Note the absense of div with class "section_content iw_right" from second div in the example html_string.

The above code will result in :

['hello', 'hello1']

Upvotes: 1

Related Questions